Scraping web pages with PHP 5


$html = @DOMDocument::loadHTMLFile($url); // fetch the remote HTML file and parse it (@ suppresses warnings).
$xml = simplexml_import_dom($html); // convert the DOM object to a SimpleXML object.
foreach ($xml->xpath('//a') as $node){ // run an XPath query and iterate through the array of results
  print (string) $node . "\n"; // casting to string produces the text contents of the node.
  print $node['href'] . "\n"; // attributes of the node are accessible as array attributes.
  print $node->asXML() . "\n\n"; // asXML() produces the whole XML string.
}

Note: if namespaces are involved, use

$xml->registerXPathNamespace('NAMESPACE_PREFIX', 'NAMESPACE_URI');

and

$xml->xpath('//NAMESPACE_PREFIX:ELEMENT')

replacing the text in capitals as appropriate.