Metadata Scrapers

·

There are now at least 5 different groups of people working on scrapers that collect metadata for a given URL:

I'll accept that there's no chance of having them all share the same code for scrapers, even with something like Rhino able to run Javascript on a server, as BibDesk wouldn't be able to use that. There must be a way, though, to describe in XML the methods needed to fetch metadata for a URL.

I imagine it would need a regular expression for matching against the URL, and an XPath or regular expression for matching against the page at that URL. Those attributes would then be used to fetch a new URL containing the metadata, and then you might need another XPath or regular expression for extracting the metadata.

Has anyone else tried to generalise scraping processes?