Pimp My Paper!

·

or... Structured Markup for Scientific Articles.

Following on from a survey of existing methods of publishing the full text of biomedical papers online in (X)HTML, I've produced a page that demonstrates the way I'd like articles to be presented:

Here's the example page

It uses a fake journal and a paper cut together from multiple sources, but if you view the source (particularly within <div id="article">), there's plenty of semantic markup of each of the elements. The markup is still subject to change, particularly the parts that align with existing microformats, but it works pretty well at the moment.

Here are some of the features:

  • The width of the text block is a percentage, so it increases when the browser window is made wider, but has a max-width in ems, so it a) stretches as the font size is changed (meaning that you don't end up with a thin column of large letters) and b) never gets wider than a certain number of characters, so the column remains readable.
  • The article outline is generated dynamically from the h3, h4 and h5 elements, using the getElementsByTagNames method in browsers that support it, and displayed using Yahoo's TreeView UI element. The article outline also floats alongside the text, so it's always visible, but without using frames.
  • Formatting of inline citations is controlled by CSS. Clicking on an inline citation [only those in the Introduction are present at the moment] will load the citation details inline, rather than scrolling to the bottom of the page.
  • Titles in the bibliography are linked directly to the items they refer to, using DOIs when available, rather than being followed by a big list of alternate links or bare URLs. COinS links are also present after each reference to allow links to user's appropriate link resolvers to be inserted dynamically.
  • Figures and tables are numbered automatically using CSS' counter() rules (not supported by Safari).
  • Nested sections are also numbered automatically in the same way, but I might remove the numbering eventually as I'm not sure that it's necessary (and the same is true for indentations in the first line of each paragraph).
  • Images and movies are displayed inline: images use Lightbox Plus to display the full, zoomable image when clicked, while movies use a preview image and only load the full movie when clicked on. Images are sized in ems, so increase in size in proportion to the text, but have a max-width in pixels so never get bigger than their original size.
  • Inline abbreviations are defined within <abbr> elements, and the abbreviations list at the end of the article uses dl/dt/dd for the definition list.
  • The PDF version of the article has a prominent icon, a direct download link, and a <link rel="alternate" link in the head of the page.


    When it's finished, all the code should be under a Creative Commons license, and I'll actively encourage publishers that present the full text of scientific articles available online to make use of it.