Feed Deltas: What's Changed?

·

When searching/fetching items through the EUtils interface to PubMed, you can specify an 'earliest' date (the mindate parameter), meaning that it's easy to fetch all the items added since a particular time.

This is a feature missing from most feed providers: all you can get, generally, is the last n items. This is ok if you're fetching items from a blog where you can poll about once a day and be reasonably sure that there won't be more than 10 or 20 new items posted in that timeframe, so you won't miss anything. However, it breaks down with feeds coming from last.fm or del.icio.us, for example, because there could easily be more than 20 new items added since you last requested the feed. Also, it's not possible to use the feed to fetch any further back in the archive.

There's RFC 3229: Delta encoding in HTTP, which was discussed (and implemented) by Bob Wyman and others in 2004. When the client asks for data, it includes the ETag of the latest item it holds (and an A-IM header) in the request and the server will only send back the modifications to the feed that were made since the ETag was issued. There's a list of clients that support RFC 3229, including the Universal Feed Parser, and Garrett Rooney made mod_speedyfeed for Apache2 (though that seems to use If-Modified-Since rather than ETag headers).

There probably needs to be support for paging too, in case the server wants to enforce a maximum number of items per feed (100, say) and there are more newer items than that available. Atom has the 'link rel="next"' element, but that presumes you'd be starting with the newest item and working backwards, whereas in this case you'd be starting with the oldest item modified since you last polled the feed and working forwards. 'link rel="prev"' might work, perhaps. This is described in RFC 5005: Feed Paging and Archiving.

There's also Microsoft's FeedSync spec, though that may have some technical problems. Google were also said to be working on "a standard for feed publishers to tell aggregators about changes in the feed (this post has been deleted, etc.)".