Citing Articles Within Articles

When writing a hypertext document, you might reference another work, linking to it using either descriptive text or the title of the work:

This sentence refers to the book <a href="http://openlibrary.org/works/OL102749W">Moby Dick</a>, by Herman Melville.

When including a quote from another work, the quote should be marked up using the <blockquote> tag (with a "cite" attribute to say where the quote came from), and optionally a <cite> tag added to show the source of the quote:

<blockquote cite="http://openlibrary.org/works/OL102749W">The liquor soon mounted into their heads, as it generally does even with the arrantest topers newly landed from sea, and they began capering about most obstreperously.</blockquote>
<cite><a href="http://openlibrary.org/works/OL102749W">Moby Dick</a></cite>

This complies with the HTML specification for the <cite> element, which is that it contains the title of the work being cited.

Inline citations

When writing a scholarly article, though, citations (as support for a statement) have historically been added as footnotes, linked at the end of a statement either in "Author, Year" format:

Theropod dinosaurs such as Tyrannosaurus rex attained masses of 7 or even 10 tonnes (<a href="#ref-1">Hutchinson et al., 2011</a>).

or as small superscript numbers:

Theropod dinosaurs such as Tyrannosaurus rex attained masses of 7 or even 10 tonnes<a style="vertical-align:super;font-size:smaller;" href="#ref-1">1</a>.

The footnote then contained the bibliographic information for the item being cited, in a reference list or bibliography:

<ul id="references">
    <li id="ref-1">Hutchinson JR, Bates KT, Molnar J, Allen V, Makovicky PJ (2011) A computational analysis of limb and body dimensions in Tyrannosaurus rex with implications for locomotion, ontogeny, and growth. PLoS ONE 6(10):e26037</li>
</ul>

Online, though, marking up citations this way doesn't make so much sense. As shown in the first examples, you really want to link directly from the text to the document being cited:

<p>Theropod dinosaurs such as Tyrannosaurus rex attained masses of 7 or even 10 tonnes (<a href="http://dx.doi.org/10.1371/journal.pone.0026037">Hutchinson et al., 2011</a>).</p>

Now we need a) something to say that this particular link is an inline citation (that the work is being referenced in support of the preceding statement), and b) something to associate the inline citation with the bibliographic annotation, so that the bibliographic information can be displayed in a popover and the reader can see in advance what's being cited.

You might think that the <cite> tag would be perfect for (a), but no! The <cite> tag is only to be used to mark up the title of the cited work, so we can only use it in the bibliographic information:

<li id="ref-1">Hutchinson JR, Bates KT, Molnar J, Allen V, Makovicky PJ (2011) <cite><a href="http://dx.doi.org/10.1371/journal.pone.0026037">A computational analysis of limb and body dimensions in Tyrannosaurus rex with implications for locomotion, ontogeny, and growth</a></cite>. PLoS ONE 6(10):e26037</li>

There was a draft of HTML3 which suggested using rev="citation" to denote inline citations, but the "rev" attribute is no longer in the HTML specification and "citation" is not a recognised link relation. That's reasonable, as those semantics wouldn't have been correct anyway (the current document is not a "citation" of the linked document, though the anchor itself could be), but it shows that this use case was being considered back then. There was also some discussion on the microformats wiki about using rel="cite" for this purpose.

Microdata

Instead, let's add some metadata (as microdata attributes) to say that this is an article, and that the bibliographic information is a citation:

<article itemscope itemtype="http://schema.org/Article">
    <header>
        <h1 itemprop="name">Example Article</h1>
    </header>

    <main>
        <section>
            <p>Theropod dinosaurs such as Tyrannosaurus rex attained masses of 7 or even 10 tonnes (<a href="http://dx.doi.org/10.1371/journal.pone.0026037" itemprop="citation" itemscope itemtype="http://schema.org/Article" itemref="ref-1">Hutchinson et al., 2011</a>).</p>
        </section>
    </main>
</article>

<ul id="references">
    <li id="ref-1">Hutchinson JR, Bates KT, Molnar J, Allen V, Makovicky PJ (2011) <cite itemprop="name"><a href="http://dx.doi.org/10.1371/journal.pone.0026037" itemprop="url">A computational analysis of limb and body dimensions in Tyrannosaurus rex with implications for locomotion, ontogeny, and growth</a></cite>. PLoS ONE 6(10):e26037</li>
</ul>

Now a machine would be able to read this document and understand that it's an Article with one citation. It can also understand that the cited work is an Article, with title "A computational analysis of limb and body…" and URL <http://dx.doi.org/10.1371/journal.pone.0026037>.

The machine can also see the inline context in which the document was cited, enabling it to display that snippet to someone viewing the cited document (this is what ReadCube does, in fact).

Citing specific parts of an article

There is still a problem, though: there's no indication of which part of the cited document was cited. If the citing URL had a fragment on the end, e.g. "http://dx.doi.org/10.1371/journal.pone.0026037#section-2", which corresponded to the id of an element in the target document, that would be helpful. There have also been experiments with using XPointer in URLs, or non-standard fragment formats to address specific parts of the target document; neither of these have good cross-browser support, so they depend on the target site to handle them appropriately using Javascript.

Even better would be to include a piece of text from the cited document in the citation, so that the relevant part of the cited document can be detected regardless of format (a PDF, for example). In fact, just a unique snippet of text (even an image of that text) is enough to create a citation, if you're using Google Goggles' OCR and the Google Books database to identify the section being cited.

Annotations

Now we're getting into annotation territory. What we're really doing is creating an annotation that says "Document A (with metadata X), at position 1 (identified by anchor element A), links to Document B (with metadata Y), at position 2 (identified by text snippet B), and the type of that link is 'citation'". I'm hoping that hypothes.is will make these kinds of annotations easy to create.

Once you have all that information, instead of just being transferred from one document to the next via hypertext links, you can start to create a display that builds itself up by transcluding the relevant section of the target document into the source document, wherever it's referenced. You can also, in reverse, transclude sections of documents that cited the current document and - with a bit more citation typing - you can also know whether the citing document supported, refuted, used data from, or made some other reference to the target document.

Summary

Most of the elements and attributes are available in HTML5 for marking up citations:

itemprop="citation": A standard attribute to say that an anchor or span element marks a citation of another work.
itemref: A standard attribute that connects a "citation" anchor or span to the metadata describing the cited work, somewhere else in the current document.
?: A standard element or attribute that contains information about which part of the cited document is being referenced. See, for example, how the Google Drive API supports annotation targets for different types of media.