Fair use

The concept of fair use seems to be coming under debate lately. In one ear is Jon Udell, encouraging us to ease information out of its stultifying prison, while in the other ear Mark Pilgrim and commentators are rightly pointing out that the accompanying Terms and Conditions are still the rules that determine what you can do with that data, however foolish and restrictive they may be.

Jack Valenti says that fair use doesn't exist in law (but then goes on to mention some fairly compelling examples of fair use), while DVD rippers and music library sharing applications spread like wildfire. So should we try and engage companies and other information providers in an attempt to adjust or work around their licencing restrictions (and either sit and wait for a reply that never comes or accept the almost inevitable "sorry, we're afraid reuse of our data is not allowed'), or do we just go ahead and scatter the info spoor far and wide, then wait for the companies to catch on?

I've experienced a lot of this lately, as I try and sew together various information sources for HubMed and experiment with the wondrous ability of server-side scripts to rip and mix data into new services. The NCBI E-Utilities web service that provides XML data from the PubMed medical literature database is superb, and the data is covered by copyright terms and conditions that explicitly permit fair use for research and education. On the other hand, the companies that publish these journals are unwilling to release the full text of their (mostly publically-funded) papers into the public domain, for fear of losing the branding opportunities that come with forcing a visit to their site to download each file. Fair enough, I suppose, but this prevents anyone from establishing a central full-text archive with automated citation linking, full-text searching, easy downloading and most importantly free and open access. I think in this case (and many others) a reevaluation of the copyright agreements (bringing them as low as 1 year after publication) would be a good idea.

Companies such as Google and Amazon have seen the potential of making their data freely available - their web services allow anyone with a licence key to produce their own applications. This has enabled the Google- and Amazon-Browsers, Googlism and Amazon Light, amongst many others. For Amazon, this cheap service drives customers to their site, while the benefits for Google are publicity, innovation and eventually commercial licensing of their web services.

Publicity and advertising seems to be the main bargaining point here. Liberating data for your own personal use seems to be acceptable to everyone, but when it comes to producing a publically available, non-profit/experimental service, companies aren't so happy to play along, especially when they have a monopoly on the data. Despite the promise of free advertising, the Terms and Conditions stranglehold keeps a lid on most innovation. For example, should we be able to build a community-driven recommendation service that uses restricted TV listings data (if we write our own programme summaries)? Should we be able to make an AMG2Magnet bookmarklet that takes data from allmusic.com and turns it into magnet URLs? Should we be able to scrape the HTML from CiteSeer and build a graph visualisation application? Should we be able to sample copyrighted music and make bootleg remixes?

If fair use really is or can be defined in law, I'd like to know how far it goes in covering making data publicly, as well as privately, available.