Visual Scrapers


A couple of visual scraper utilities showed up recently: the web-based Dapper and desktop-based OpenKapow.

OpenKapow is a >100MB download, ~80MB of which is the standard Java 1.5 runtime environment, which seems a bit ridiculous. It's an impressive piece of software once you get it running, with just about every option available. I couldn't work out how to get it to produce any output though.

Dapper is visually slick, takes you through the process smoothly (with a few unnecessary sidesteps), but is too limited to be useful - it's impossible to select important elements in the page, as there's no tree view of the DOM.

So it's back to file_get_contents > Tidy > SimpleXML > XPath and regular expressions, which works well enough.