- Helma NG on App Engine. Seems like it'll be nice when it's stable, using the ServerJS Securable Modules system, Jack and Rhino. I got a basic "fetch a remote file and print it" working, but couldn't yet work out how to get a DOM from a remote document, or how to import env.js and Sizzle.
- Stefano Mazzocchi's Sizzle-based scraping app in Acre. Modified NekoHTML parser and env.js, plus Sizzle for selectors. Hosted by Freebase.
- Jaxer. Develop and run in Aptana Studio, an Eclipse plugin. DOM scraping using the same engine as Firefox 3. Can be deployed to the Aptana Cloud.
- Headless Firefox 3/XulRunner and HTTP socket for communication. Still not sure if it'll run headless properly, without any interaction.
Update: Yahoo! announced YQL Execute yesterday, which allows server-side Javascript (including CSS 3 selectors) to be executed between DOM fetching and returning YQL results. The only problem with YQL is that - because it obeys robots.txt rules - it's often denied access to web content.