Lots of venues publish events listings on their own sites, but few publish full RSS/Atom feeds and even fewer publish iCalendar feeds that aggregators can subscribe to. Sites like Time Out have large, searchable directories of events, but they don't have everything and are difficult to customise.

In an attempt to remedy this, I've started writing scrapers for venues that I'm interested in, and a framework to run them in. It's a similar set-up to CiteULike's scrapers: they can be in any scripting language that'll run on the command line (mine are in PHP but there are good scraping libraries available for Ruby and Python); they just need to output their results in a standard format.

At the moment the standard format is iCalendar, but I think it'll probably make more sense to use an XML, RDF or JSON serialisation, depending on how complex the data ends up being, and adding iCalendar as an output option once the results have been processed.

One aim is to make a site that shows events on today at just the venues I'm interested in, but I'm sure there'll be other uses once the data is available. Hopefully it'll illustrate to venues the value of publishing decent structured events data too.

The code's on GitHub; if you have scrapers to contribute, send me an email or a pull request.