Using Google to Fetch All of a Feed's Items


Google has an official Feed API and several other methods that let you retrieve historical items from a feed.

The easiest way to access the items for one feed is to log in to your Google account in a browser, load, then save the resulting Atom file.

To access the items programmatically, the following options are available:

If you don't need all the original metadata for each item, you can fetch a JSON representation of each item, as used in Google Reader's UI:

$feed = urlencode('');
$continuation = '';
  $url = sprintf('', $feed, $continuation);
  $json = file_get_contents($url);
  $data = json_decode($json);
  $continuation = $data->continuation;
  print "Continuation: $continuation\n";
  foreach ($data->items as $item)
} while ($data && $json && $continuation);

This has the advantage of returning every item from the feed (it goes > 6500 at least); not needing to be logged in; being easy to parse; and includes elements like enclosures, but doesn't include some elements such as the original id for each entry.

The official method is to use the AJAX Feed API:

$feed = '';
$params = array(
  'q' => $feed,
  'v' => '1.0', // API version
  'num' => -1, // maximum entries (limited)
  'output' => 'json_xml', // mixed content: JSON for feed, XML for full entries (json|xml|json_xml)
  'scoring' => 'h', // include historical entries
$result = file_get_contents('' . http_build_query($params));
$json = json_decode($result);
$data = $json->responseData;
// json version
foreach ($data->feed->entries as $entry)
// xml version
$xml = simplexml_load_string($data->xmlString);
foreach ($xml->channel->item as $item) // only matches RSS2 - need namespace for Atom

This way you get the full, original XML version of the feed, but it's not normalised (which it harder to parse - the Javascript API has a parsing function built in) and only contains a limited number of entries (seems to be 250, json_decode has problems).

If you're logged in, you can fetch a normalised Atom representation of the feed:

$feed = urlencode('');
$params = array('Email' => 'YOUR_GOOGLE_EMAIL', 'Passwd' => 'YOUR_GOOGLE_PASSWORD');
$context = stream_context_create(array('http' => array('method' => 'POST', 'content' => http_build_query($params))));
$result = file_get_contents('', NULL, $context);
$sid = array_pop(explode('=', array_shift(explode("\n", $result))));
$cookie = array(
  'SID=' . $sid,
$header = sprintf("Cookie: %s\r\n", implode('; ', $cookie));
$context = stream_context_create(array('http' => array('method' => 'GET', 'header' => $header)));
$continuation = '';
  $url = sprintf('', $feed, $continuation);
  $data = file_get_contents($url, NULL, $context);
  $xml = simplexml_load_string($data);
  $xml->registerXPathNamespace('atom', '');
  $xml->registerXPathNamespace('gr', '');
  $continuation = (string) array_shift($xml->xpath('/atom:feed/gr:continuation'));
  print "Continuation: $continuation\n";
  $items = $xml->xpath('/atom:feed/atom:entry');
  foreach ($items as $item)
} while ($xml && $continuation);

This returns an apparently unlimited number of normalised Atom entries (> 6500, at least).