PHP script for downloading MP4 files from iPlayer

Paul Battley wrote a Ruby script to get around the BBC's continued attempts to stop people downloading MP4 files from iPlayer. Here's a version in PHP using libcurl:
$ch = curl_init();

// fetch the HTML and extract the PID
curl_setopt_array($ch, array(
  CURLOPT_URL => $argv[1], // URL of the iPlayer viewing page
  CURLOPT_RETURNTRANSFER => TRUE,
  CURLOPT_FOLLOWLOCATION => TRUE,
  CURLOPT_USERAGENT => 'Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A543a Safari/419.3',
  CURLOPT_COOKIEJAR => '/tmp/cookies.txt',
));

$html = curl_exec($ch);
if (! preg_match("/\bpid\s+:\s+'(.+?)'/", $html, $matches)) exit('No PID');
$pid = $matches[1];

// extract the title + episode
if (preg_match("/iplayer.pid\s+=\s+'(.+?)'/", $html, $matches)){	
  $xml = simplexml_load_file("http://www.bbc.co.uk/iplayer/metafiles/episode/{$matches[1]}.xml");
  $title = (string) $xml->concept->title;
  if ($xml->concept->subtitle)
    $title .= ' - ' . $xml->concept->subtitle;
  $title = preg_replace('/[^a-z0-9 \-]/i', '', $title);
}

if (!$title) $title = 'Untitled';

// fetch the content range value
curl_setopt_array($ch, array(
  CURLOPT_URL => "http://www.bbc.co.uk/mediaselector/3/auth/iplayer_streaming_http_mp4/$pid?" . rand(0, 1000000),
  CURLOPT_USERAGENT => 'Apple iPhone v1.1.4 CoreMedia v1.0.0.4A102',
  CURLOPT_RANGE => '0-1',
  CURLOPT_HEADER => TRUE,
));

$response = curl_exec($ch);
if (! preg_match('/\bContent-Range: bytes 0-1\/(\d+)/', $response, $matches)) exit ('No Content-Range');

// fetch the movie file
$out = fopen("$title.mov", 'w');
curl_setopt_array($ch, array(
  CURLOPT_RANGE => '0-' . $matches[1],
  CURLOPT_HEADER => FALSE,
  CURLOPT_NOPROGRESS => FALSE,
  CURLOPT_RETURNTRANSFER => FALSE,
  CURLOPT_FILE => $out,
));

curl_exec($ch);
curl_close($ch);

Comments

Nice, much more concise than my one! You should use the meta data XML for the programme instead of screen-scraping though - it'll save it from breaking if the page is redesigned.

Updated today to use the metadata XML and the full user agent from http://beebhack.bluwiki.com/

I'm a PHP newbie, I saved the above script as a php file. then had to get CURL installed on WAMP, now I get a 'No PID' error.

Looking at the script you set the URL in the $argv[1] array. I've changed line 5 to...

CURLOPT_URL => 'http://news.bbc.co.uk/1/hi/programmes/newsnight/7343060.stm',

...just to check that the above is OK and secondly what is the value for the URL that I should be using for this BBC page (the embedded iplayer movie near the top of this page) - http://news.bbc.co.uk/1/hi/programmes/newsnight/7343060.stm ????

Rgds,
T

Posted by: T on April 15, 2008 12:42 PM

Been playing further and found an alternative iplayer URL of: http://www.bbc.co.uk/iplayer/page/item/b009wxm9.shtml

I've looked at the html source of this page and been tring different URL values on line 5 (replacing $argv with url in single quotes follwed by comma).

Thus far I get a mixture of 'No PID' or 'No Content-Range' error messages.

Any help Jedi Master would be greatly appreciated,
Padawan T

Posted by: T on April 15, 2008 1:06 PM

T: $argv[1] means the first argument that's passed to the script on the command line. As you found, it's the URL for the episode page on iPlayer that you need to use, so the script is run like this (assuming it's saved as iplayer_dl.php):

php iplayer_dl.php 'http://www.bbc.co.uk/iplayer/page/item/b009wxm9.shtml'

Unfortunately it doesn't look like anything's downloadable at the moment, judging by the few URLs I just tried.

Great script. Unfortunately I am outside the UK so that I cannot test it. All the UK proxies did not work for me. Is your script still working? This page tells that they have changed the method: http://beebhack.wikia.com/wiki/IPhone_H.264_version. Thanks in advance for your help.

Posted by: tim on July 29, 2008 1:45 AM

No, this script isn't working any more.

Posted by: alf on July 29, 2008 8:04 AM

All fields are optional, email address will not be shown; no HTML, URLs are automatically hyperlinked.