Citing With URIs in Google Docs

I built a script that runs in Google Docs, turning inline citations into a formatted bibliography. It lets you cite using DOIs, Mendeley library IDs, or any URL that returns metadata as JSON. It's a first, basic attempt, but here's why I like it:

Citing With URIs

The most straightforward way of being able to cite something in a document is to insert an identifier. On the web we hyperlink using URLs, which provide a unique identifier for the item being referenced - with the added bonus of being able to follow that URL to retrieve the item. When writing a scholarly article, however, there's still an expectation that the metadata for a citation will be provided, so that the reference will still make sense even if the URL stops working.

To be able to successfully cite using identifiers, therefore, means being able to retrieve the metadata for each identifier, and the simplest way to do that is to convert that identifier to a URL - if it isn't already - and retrieve it using an HTTP request.

Once we have the metadata for each citation, all that's needed is to generate a bibliography (a list of endnotes) at the end of the document, and insert links to those references inline. As a complication, there are many different publishing systems, and they each have their own special preferred formatting for those inline citations and bibliographies, so the tool should ideally be able to cater for any of those formats.

There is a need for citation software that works with Google Docs, as it's basically the standard online writing tool (and is continually getting more awesome). I've managed to get the first steps of a citation processor working in Google Docs; it's not complete yet...

Inserting and Processing Citations

Google Apps Script provides a way to add menu items to Google Docs and call a function when a menu item is selected. It's server-side Javascript, with an online editor that functions well. You can currently only attach scripts to Google Spreadsheets, but that's ok in this case: we need somewhere to store a local copy of our references.

Here's how to use Google Apps Script to format citations in a Google Document:

  1. Create a new Document in Google Docs and give it a unique title.
  2. Write your article, adding citations inline in the form {{cite:doi:10.1038/nchem.1108}}.
  3. Create a new Spreadsheet in Google Docs and give it a title which is the same as the document, but with " - References" at the end.
  4. Add my Exciting script to the spreadsheet (Tools > Script Editor). Once it's installed, an "Exciting" menu should appear.
  5. From the "Exciting" menu, select "Generate Bibliography".

The script will now create a copy of the document (which must be in the same folder as the spreadsheet, and have the same name minus the " - References" suffix). The original document will remain untouched. It will parse the document for {{cite}} strings, fetch the metadata for each one, and store the data in the current spreadsheet (if the script is run a second time, it will use this local data instead of fetching it again). It will then replace the inline citations with numbered references, add a formatted bibliography at the end of the document, email you a PDF of the final, formatted document, and move the formatted copy of the document to the trash.

[NB: this is a first attempt, written last weekend. The citation formatting is very, very basic.]

There are several ways to cite using this system, and this is where it gets most interesting:

Theoretically, you can cite any URL, and the script will retrieve the metadata from that URL and make use of it. In practice, not nearly as many URLs as I'd like perform content negotiation and return JSON instead of HTML from the same URL, and even when they do there's no standard format for the reference metadata (which is where RDF comes in, but there's no RDF parser in Google Apps Script; RDF triples as JSON would be an good intermediate). The current script has custom functions to normalise the data returned from CrossRef and Mendeley into a single, standard format for local use; adding other sources would probably require a custom parser for their metadata as well.

I'm not able to enter the Mendeley/PLoS API Binary Battle, but |'d be delighted if anyone who's interested was to take this code and make use of it. I see the next steps like this, possibly: 1) get citeproc-node running on a node.js server somewhere (Heroku or Joyent, maybe), and use that for formatting the references; 2) use the UI Services/GUI Builder in Google Apps Script to build an editing interface, for tidying up references once they've been retrieved; 3) add the ability to specify custom formatting for the inline citations, and to choose the citation format for the bibliography.