Using Google to Fetch All of a Feed's Items

·

Google has an official Feed API and several other methods that let you retrieve historical items from a feed.

The easiest way to access the items for one feed is to log in to your Google account in a browser, load http://www.google.com/reader/atom/feed/ESCAPED_FEED_URL?n=10000, then save the resulting Atom file.

To access the items programmatically, the following options are available:

If you don't need all the original metadata for each item, you can fetch a JSON representation of each item, as used in Google Reader's UI:

<?php
$feed = urlencode('http://special-j.net/feed/atom/');
$continuation = '';
do{
  $url = sprintf('http://www.google.com/reader/api/0/stream/contents/feed/%s?n=100&c=%s', $feed, $continuation);
  $json = file_get_contents($url);
  $data = json_decode($json);
  
  $continuation = $data->continuation;
  print "Continuation: $continuation\n";
    
  foreach ($data->items as $item)
    print_r($item);
  
} while ($data && $json && $continuation);

This has the advantage of returning every item from the feed (it goes > 6500 at least); not needing to be logged in; being easy to parse; and includes elements like enclosures, but doesn't include some elements such as the original id for each entry.

The official method is to use the AJAX Feed API:

<php
$feed = 'http://special-j.net/feed/atom/';
$params = array(
  'q' => $feed,
  'v' => '1.0', // API version
  'num' => -1, // maximum entries (limited)
  'output' => 'json_xml', // mixed content: JSON for feed, XML for full entries (json|xml|json_xml)
  'scoring' => 'h', // include historical entries
);
$result = file_get_contents('http://ajax.googleapis.com/ajax/services/feed/load?' . http_build_query($params));
$json = json_decode($result);
$data = $json->responseData;
// json version
foreach ($data->feed->entries as $entry)
  print_r($entry);
// xml version
$xml = simplexml_load_string($data->xmlString);
foreach ($xml->channel->item as $item) // only matches RSS2 - need namespace for Atom
  print_r($item);

This way you get the full, original XML version of the feed, but it's not normalised (which it harder to parse - the Javascript API has a parsing function built in) and only contains a limited number of entries (seems to be 250, json_decode has problems).

If you're logged in, you can fetch a normalised Atom representation of the feed:

<?php
$feed = urlencode('http://special-j.net/feed/atom/');
$params = array('Email' => 'YOUR_GOOGLE_EMAIL', 'Passwd' => 'YOUR_GOOGLE_PASSWORD');
$context = stream_context_create(array('http' => array('method' => 'POST', 'content' => http_build_query($params))));
$result = file_get_contents('https://www.google.com/accounts/ClientLogin', NULL, $context);
$sid = array_pop(explode('=', array_shift(explode("\n", $result))));
$cookie = array(
  'SID=' . $sid,
  'domain=.google.com',
  'path=/',
  'expires=160000000000',
  );
  
$header = sprintf("Cookie: %s\r\n", implode('; ', $cookie));
$context = stream_context_create(array('http' => array('method' => 'GET', 'header' => $header)));
$continuation = '';
do{
  $url = sprintf('http://www.google.com/reader/atom/feed/%s?n=100&c=%s', $feed, $continuation);
  $data = file_get_contents($url, NULL, $context);
  $xml = simplexml_load_string($data);
  $xml->registerXPathNamespace('atom', 'http://www.w3.org/2005/Atom');
  $xml->registerXPathNamespace('gr', 'http://www.google.com/schemas/reader/atom/');
    
  $continuation = (string) array_shift($xml->xpath('/atom:feed/gr:continuation'));
  print "Continuation: $continuation\n";
  
  $items = $xml->xpath('/atom:feed/atom:entry');
  foreach ($items as $item)
    print_r($item);
  
} while ($xml && $continuation);

This returns an apparently unlimited number of normalised Atom entries (> 6500, at least).