- Helma NG on App Engine. Seems like it'll be nice when it's stable, using the ServerJS Securable Modules system, Jack and Rhino. I got a basic "fetch a remote file and print it" working, but couldn't yet work out how to get a DOM from a remote document, or how to import env.js and Sizzle.
- Stefano Mazzocchi's Sizzle-based scraping app in Acre. Modified NekoHTML parser and env.js, plus Sizzle for selectors. Hosted by Freebase.
- Jaxer. Develop and run in Aptana Studio, an Eclipse plugin. DOM scraping using the same engine as Firefox 3. Can be deployed to the Aptana Cloud.
- Headless Firefox 3/XulRunner and HTTP socket for communication. Still not sure if it'll run headless properly, without any interaction.
Update: Yahoo! announced YQL Execute yesterday, which allows server-side Javascript (including CSS 3 selectors) to be executed between DOM fetching and returning YQL results. The only problem with YQL is that - because it obeys robots.txt rules - it's often denied access to web content.
Comments
All fields are optional, email address will not be shown; no HTML, URLs are automatically hyperlinked.

You missed out YQL's new Execute method, which is kind of like stored procedures for DOM scraping written in server-side JavaScript (announced today):
http://developer.yahoo.net/blog/archives/2009/04/yql_execute.html
It may be a shameless plug, but you should really check out ESXX at http://esxx.org/. Scraping an HTML page is a one-liner:
var doc = new URI("http://esxx.org/").load();
The 'doc' variable then contains an E4X node that can be accessed directly. For instance, the expression
doc.body..p[0]
returns the first paragraph, while
for each (let a in doc..a.(/^http:/.test(@href))) {
// use 'a'
}
iterates over all elements with an 'href' attribute that begins with 'http:'.