RSS/XML news feed headline grabber!

script written by

Description:

This script will parse and process RDF/XML feeds, using PHP. Simply change the required variables in the code below and publish to a suitable web server. Then, preview the page(s) in your browser!

Please note: In order for this script to work correctly, the directory that rdf.php and your newsfeed page are in must be set to CHMOD 777. This is in order for the RDF file to be written.

The first piece of code here is the base class file that will form the cornerstone of this script. For every separate news feed you want to use you will create a specific php page and each of these pages will use this base class in order display the output headlines. Right, down to business;

Name this page rdf.php

The variables that can be changed or added to this file are as follows:
$numLinks = 10 :- This can be changed to the amount of clickable headline links you want to show.
function displayRDF($rdf, $numLinks) :- Optionally, add the parameter "$randomize = true" into the parenthesis of this function. This will randomly show the amount of links from the whole file. (* $numLinks would need to be a lower number in order for this to work appropriately)


<?PHP
class rdf // A util Class for reading in RDF XML Headline style files...
  { 
  function createRDF($url, $timeToLive)
  {
  // ************************************************************* //
  // NB CHMOD THE WORKING DIR TO 777 TO ALLOW PHP TO CREATE FILES! //
  // ************************************************************* //
  
  $basename = basename($url) ;
  
  if (!file_exists($basename))
  {
  touch($basename, ($timeToLive+1)); // Make it older then the time to live...
  chmod($basename, 0777) ;
  }
  
  }
  
  function getRDF($url, $timeToLive) // This function will read the data in from 
  the passed in URL if it is required.
  {
  $timeToLive *= 60; // convert timeToLive into secs
  $basename = basename($url) ; // Get the basename of URL for the cached file 
  name. We do this here, cos we need to do it more than once
  
  $this->createRDF($url, $timeToLive);
  
  $timestamp = filemtime($basename); // Get the timestamp of the file.
  $age = (time() - $timestamp); // Work out how old the file is that we have already..
  
  if($age > $timeToLive) // If the file is too old, then we need to refresh 
  it from the URL
  {
  $rdfHandle = fopen($url,"r") ; // Open the RDF file for reading 
  $rdfData = fread($rdfHandle, 64000) ; // Read in the RDF data. 64K limit on 
  filesize, should be enough.
  fclose($rdfHandle); // Close the Data feed
  
  // OK there is more recent news, so rewrite the cached news file..
  $localFile = fopen($basename, "w") ; // Open the local file for writing
  fwrite($localFile, $rdfData ) ; // Pump in all the data into the file.
  fclose($localFile) ; // Close the local file after writing to it
  } // end IF
  } // end getRDF
  
  function formatLink($item) // Removes spurious tags from the link that we don't 
  want
  {
  $link = ereg_replace(".*<link>","",$item); // Remove 
  the leading <link> tags
  $link = ereg_replace("</link>.*","",$link); // Remove 
  the closing </link> tags
  $title = ereg_replace(".*<title>","",$item); // Remove 
  the leading <title> tags
  $title = ereg_replace("</title>.*","",$title); // 
  Remove the closing </title> tags
  if ($title) // If we got anything left after all that trimming...
  // Choose how you want the link formatted here... This has no underline, and 
  opens in a new window...
  echo "<a href=\"$link\" style=\"text-decoration:none\" 
  target=\"_blank\">$title</a><br>";
  } // end formatLink
  
  function displayRDF($rdf, $randomise = True, $numLinks = 3)
  {
  $localFile = fopen($rdf, "r"); // OK open up the local rdf file for 
  reading
  clearstatcache() ; // Clear out the file size cache
  $rdfData = fread($localFile, filesize($rdf)); // Read in the data to memory
  fclose($localFile); // Close down the open file.
  
  // Get rid of all spurious leading and closing rdf data from the data in memory
  $rdfData = ereg_replace("<\?xml.*/image>","",$rdfData);
  $rdfData = ereg_replace("</rdf.*","",$rdfData);
  $rdfData = ereg_replace("[\r,\n]", "", $rdfData);
  $rdfData = chop($rdfData); // Strip any whitespace from the end of the string
  
  $rdfArray = explode("</item>",$rdfData); // Split up the string 
  into an array to make it more manageable
  $max = sizeof($rdfArray); // See how many items we have got
  if ($max > 1)
  {
  
  // Echo the font formatting... This is just HTML to make it look a little pretty
  echo "<font face=\"verdana, arial, helvetica\" size=\"1\"> 
  <list> " ;
  
  if ($randomise) // We need to do different stuff if we are to randomise the 
  links... 
  {
  // The links will be randomised so we want a different message to the user....
  // The max -1 is to compensate for the 0 indexed array structure..
  echo "Displaying $numLinks (of " . ($max-1) . ") random headlines 
  from $rdf... Updated every 30 minutes.. Refresh for some more!<br>" 
  ;
  $links = array_rand( $rdfArray, $numLinks ) ; // OK select the keys of the links 
  at random from the array..
  $upperLimit = $numLinks ; // Set this to the number of links to be displayed
  
  } else {
  
  echo "Displaying headlines from $rdf <br> " ;
  $links = array_keys($rdfArray) ; // Give the keys to be displayed all the links 
  we have parsed..
  $upperLimit = $max ; // Set the upper Limit to be all of the headlines
  }
  
  // Display the links...
  for ($i = 0 ; $i < $upperLimit ; $i++ )
  $this->formatLink($rdfArray[$links[$i]]);
  
  // Close the font formatting like a good html coder ;)
  echo "</font>" ;
  } else {
  echo "Sorry, no links found in the RDF file $base from $url..." ;
  }
  } 
  }
?>

The following code is an example of a php page that you would request to get the headlines, from the relevant website.

This page can be called whatever you like - it will be the page that you type into the address bar or request via a hyperlink in order to display the headlines.

Variables to change here:
$url = "http://the.url.com/file.rdf" :- Change this to the remote RDF file URL.

<?PHP
// OK this file will get the headlines from Slashdot... 
include("rdf.php"); // Include out RDF util class

$rdf = new rdf(); // Create a new rdf class

$url = "http://path.to.remote/url.rdf" ; // Define the URL to get 
the RDF file from. *** EDIT THE ABOVE URL TO YOUR RDF FILE URL! ***
$base = basename($url) ; // Work out the name that we are going to use as a local 
cached file.

$rdf->getRDF($url, 30); // Get the RDF Headlines...

// This shows what happens when you don't want to randomise this links and display 
them all...
$rdf->displayRDF($base, False); // Display the RDF headlines...
?>

If you would prefer to download the files as a zipped archive, click here. To see an example of the script, click here.

Discuss in the forum



Remember, if you get stuck or need to ask any questions, register with the forums and tell us about it!

Partners Partners