Creating a citable archive of a web page


Academic papers or weblog posts often need to refer to external web pages; generally, you want people to see the external pages as they were when you wrote about them.

The simplest way to do this is a standard hyperlink, combined with a quote of the appropriate section of the text. If you're referencing long pages though, lots of lengthy quotes could get out of hand.

You could save the external web page and host a copy of it locally, but this is troublesome and could be unreliable (as URLs change over time).

You could point to The Wayback Machine ('s cache of the page, which Simpy does for stored bookmarks, but you can't guarantee that will have a cache of the page from when you looked at it, as there's no way to trigger an import into the archive. You could also use Google's cache, but that only stores the most recently crawled version of the page.

You could use a bookmarking service such as Furl or Yahoo's My Web, which store a cached version of the web page when you bookmark it. This is a good solution, but it only allows you to store one cached version of each page, so if you bookmark the same page again in the future it will overwrite the original cache (though adding a random string, or the date, to the end of the URL would perhaps be one way to get around this limitation).

Or, you could use a service that some colleagues of mine have produced called WebCite. Triggering the import of a web page via a bookmarklet stores a dated copy of that page in the archive (if that's allowed by the original site), which can then be referred to - indefinitely - by a unique URL. Another way to import referenced web pages is by uploading a paper that will be automatically parsed for hyperlinks: this is a model that could be used to provide ongoing support for the archive, as member publishers use it to cache web pages referred to by their articles.

Obviously there will have to be limits on how much anyone can use the WebCite archive - i.e. not for making a backup of every page on their site every day - but that will probably depend on the patterns of usage.