After another attempt at getting XPath to work in Rhino (progressed a bit, but not enough: see the XPath bits at the end of env.js), I decided to finish off getting the socket listener code from Crowbar working with Zotero. Basically this means that you run Zotero in an instance of Firefox, listening on port 10000, then pass a URL to it. Firefox loads the URL, Zotero parses it, then the extracted metadata is returned as JSON.

If you want to try it, get the development version of Zotero, put zowbar.js in the chrome/content/zotero folder, and use the overlay patch to make sure it's loaded. You also need a recent nightly build of Firefox 3, as this uses the new native JSON encoding. Install the modified Zotero as an extension for that Firefox (use -P when starting Firefox to create a new development profile and -no-remote to start a separate instance), then pass it a URL, eg

TODO: Close the appropriate tab for the Zotero object, so it can scrape more than one URL at once. Override (cancel) dialogs and other things that would hold up the browser. Look into using Xvfb to run Firefox headless on a server.