Google has an official Feed API and several other methods that let you retrieve historical items from a feed.
The easiest way to access the items for one feed is to log in to your Google account in a browser, load http://www.google.com/reader/atom/feed/ESCAPED_FEED_URL?n=10000, then save the resulting Atom file.
To access the items programmatically, the following options are available:
If you don't need all the original metadata for each item, you can fetch a JSON representation of each item, as used in Google Reader's UI:
<?php
$feed = urlencode('http://special-j.net/feed/atom/');
$continuation = '';
do{
$url = sprintf('http://www.google.com/reader/api/0/stream/contents/feed/%s?n=100&c=%s', $feed, $continuation);
$json = file_get_contents($url);
$data = json_decode($json);
$continuation = $data->continuation;
print "Continuation: $continuation\n";
foreach ($data->items as $item)
print_r($item);
} while ($data && $json && $continuation);
This has the advantage of returning every item from the feed (it goes > 6500 at least); not needing to be logged in; being easy to parse; and includes elements like enclosures, but doesn't include some elements such as the original id for each entry.
The official method is to use the AJAX Feed API:
<php
$feed = 'http://special-j.net/feed/atom/';
$params = array(
'q' => $feed,
'v' => '1.0', // API version
'num' => -1, // maximum entries (limited)
'output' => 'json_xml', // mixed content: JSON for feed, XML for full entries (json|xml|json_xml)
'scoring' => 'h', // include historical entries
);
$result = file_get_contents('http://ajax.googleapis.com/ajax/services/feed/load?' . http_build_query($params));
$json = json_decode($result);
$data = $json->responseData;
// json version
foreach ($data->feed->entries as $entry)
print_r($entry);
// xml version
$xml = simplexml_load_string($data->xmlString);
foreach ($xml->channel->item as $item) // only matches RSS2 - need namespace for Atom
print_r($item);
This way you get the full, original XML version of the feed, but it's not normalised (which it harder to parse - the Javascript API has a parsing function built in) and only contains a limited number of entries (seems to be 250, json_decode has problems).
If you're logged in, you can fetch a normalised Atom representation of the feed:
<?php
$feed = urlencode('http://special-j.net/feed/atom/');
$params = array('Email' => 'YOUR_GOOGLE_EMAIL', 'Passwd' => 'YOUR_GOOGLE_PASSWORD');
$context = stream_context_create(array('http' => array('method' => 'POST', 'content' => http_build_query($params))));
$result = file_get_contents('https://www.google.com/accounts/ClientLogin', NULL, $context);
$sid = array_pop(explode('=', array_shift(explode("\n", $result))));
$cookie = array(
'SID=' . $sid,
'domain=.google.com',
'path=/',
'expires=160000000000',
);
$header = sprintf("Cookie: %s\r\n", implode('; ', $cookie));
$context = stream_context_create(array('http' => array('method' => 'GET', 'header' => $header)));
$continuation = '';
do{
$url = sprintf('http://www.google.com/reader/atom/feed/%s?n=100&c=%s', $feed, $continuation);
$data = file_get_contents($url, NULL, $context);
$xml = simplexml_load_string($data);
$xml->registerXPathNamespace('atom', 'http://www.w3.org/2005/Atom');
$xml->registerXPathNamespace('gr', 'http://www.google.com/schemas/reader/atom/');
$continuation = (string) array_shift($xml->xpath('/atom:feed/gr:continuation'));
print "Continuation: $continuation\n";
$items = $xml->xpath('/atom:feed/atom:entry');
foreach ($items as $item)
print_r($item);
} while ($xml && $continuation);
This returns an apparently unlimited number of normalised Atom entries (> 6500, at least).