$html = new DOMDocument();
@$html->loadHTMLFile($url); // fetch the remote HTML file and parse it (@ suppresses warnings).
$xml = simplexml_import_dom($html); // convert the DOM object to a SimpleXML object.
foreach ($xml->xpath('//a') as $node){ // run an XPath query and iterate through the array of results
print (string) $node . "\n"; // casting to string produces the text contents of the node.
print $node['href'] . "\n"; // attributes of the node are accessible as array attributes.
print $node->asXML() . "\n\n"; // asXML() produces the whole XML string.
}
Note: if namespaces are involved, use
$xml->registerXPathNamespace('NAMESPACE_PREFIX', 'NAMESPACE_URI'); and $xml->xpath('//NAMESPACE_PREFIX:ELEMENT') replacing the text in capitals as appropriate.
Comments
All fields are optional, email address will not be shown; no HTML, URLs are automatically hyperlinked.

Thanks, this was useful. I was just looking at scraping with CURL, but this seems better . . .
I always find:
1. Fetch
2. Run through html tidy
3. Parse with simplexml
4. xpath fun
Works a treat, and can fix some... messy... pages.
You could run it through Tidy - I used to - but I find it easier to use PHP's built-in HTML parser instead.
Easy enough with DOMDocument, isn't it? I remember a while ago trying to scrape it all manually. No chance!
For me, SimpleXML is significantly easier to work with than DOMDocument (as long as you don't want to do anything too complicated).