Guardian Open Platform

The Guardian announced its Open Platform this morning. There are basically two parts:
  1. An API for querying and fetching articles (exposing their Endeca search interface, using Mashery for handling API keys and limits)
  2. A data store for data sets that journalists have collected as part of their research (a formalised version of the data sets released earlier this year).

You can display the full (unaltered) articles and, as is the trend, the service is free for up to 5000 queries per day - after that the agreement is to display ads in return for a share of advertising revenue.

There are client libraries for various languages, but here's a quick PHP snippet for fetching articles (up to 25,000 this way) in a particular category:

<?php
define('API_KEY', 'YOUR_API_KEY');
define('API_URL', 'http://api.guardianapis.com/content');
$n = 50;
$i = 0;
do{
  $start = $i * $n;
  $params = array(
    'api_key' => API_KEY,
    'content-type' => 'article',
    'filter' => '/science',
    'format' => 'json',
    'count' => $n,
    'start-index' => $start,
    );
    
  $data = json_decode(file_get_contents(API_URL . '/search?' . http_build_query($params)));
  foreach ($data->search->results as $item)
    file_put_contents(sprintf('output/%d.js', $item->id), json_encode($item));
  sleep(1);
} while ($data->search->count && $start < $data->search->count && ++$i < 5000);
Note that according to the Terms & Conditions "you must not keep any OPG Content for longer than 24 hours."