The state of online biomedical full text articles

·

Introduction

While PDFs of scientific papers are the historical descendants of print-based publication, most journals also make the full text of papers available online in HTML format. These pages contain hypertext links, thumbnails of figures and tables and reference lists that are linked to external resources. Some people prefer to read and store papers in this format—personally I've always used PDF, as the HTML pages tended to be badly presented. Web page layouts have been improving though, so I carried out a survey of HTML fulltext pages from the major publishers with the aim of identifying a) problems with usability and b) recurring themes in semantic markup of article elements (which will hopefully lend itself to a microformat recommendation for scientific, or at least biomedical, articles).

The features studied were:

  • The width of the page layout (often the column can be too narrow to read, especially if the text isn't justified).
  • The presence of a visible DOI for the article.
  • The use of an article outline or links to major section headings.
  • The method of linking from in-text citations to the full details of each reference. Usually this scrolls the page to the reference list at the end, so it's useful to have a link back to the citation (though the browser's Back button can often do this too).
  • Presentation of outgoing links from each item in the reference list, used to access cited papers.
  • HTML markup used for the article title, main sections (Abstract, Introduction, Materials and Methods, Results, Discussion, Acknowledgements, References), minor sections, figures, individual references and parts of each reference (author, journal name, year, volume, page number etc).
  • The DOCTYPE for the document.
  • Whether the document syntax was valid (according to the W3C Markup Validation Service) - a low number of errors (<10) is excusable.

Results

good things are marked in green; bad things are marked in red; useful markup is marked in yellow.

Publisher Page format Visible DOI Section links In-article citation links Outgoing reference links Markup for title [1] Markup for main sections Markup for minor sections Markup for figures Markup for each reference Sub-reference markup DOCTYPE Valid (number of errors) Fulltext URL
Wiley framed yes In sidebar and arrows under each major section heading; major sections same page, not reversible Internal resolver page (ChemPort, PubMed), DOI .articleTitle .firstLevelHeading, a#SEC1-1 .firstLevelHeading, a#SEC2-1 a#FIG1 .referenceNum, a#BIB1 italic, bold XHTML 1.0 Transitional no (4) link
NPG narrow yes in sidebar; major sections same page, not reversible Article (DOI), PubMed, ISI, ChemPort .articletitle a#Materials_and_methods, .articletype <strong> a#fig1 a#bib1 .journal, .jnumber none no (8) link
Highwire full-width no inline box after each section heading; major sections same page, not reversible [Abstract/Free Full Text] or [CrossRef][Medline] none a#BDY, a#SEC1, a#BIBL <em> a#F1 a#R1 italic, bold none no (97) link
Elsevier (ScienceDirect) narrow yes inline Article Outline; major and minor sections same page, reversible Abstract-Elsevier BIOBASE | Abstract-EMBASE | Abstract-MEDLINE | $Order Document | Full Text via CrossRef | Abstract + References in Scopus | Cited By in Scopus <p> inside <h2> <h2> <h3> a#FIG1 a#bib1 italic none no (754) link
Nature narrow yes in sidebar; major sections same page, not reversible Article|PubMed|ISI|ChemPort <h2> .heading1 <b> a#F1 a#B1 italic, bold none no (184) link
Blackwell (Synergy) narrow yes in drop-down menu next to each section heading use Javascript to open in new window SYNERGY Abstract CrossRef Abstract MEDLINE Abstract ISI Abstract CSA Abstract; image links, use Javascript to open in new window .abstracttitle a#h6,12,17,18,19 .heading2 f1, javascripted new window a#b1 italic, bold HTML 4.01 Transitional no (51) link
Wolters (OVID) full-width no outline in sidebar; major and minor sections same page, reversible BIOSIS, MEDLINE (both just text records) .fulltext-TITLE .fulltext-LEVEL1 .fulltext-LEVEL2 .fulltext-graphic .fulltext-REFERENCES italic HTML 4.01 Strict no (111) (expires)
AAAS medium-width yes N/A same page, not reversible [CrossRef] [ISI] [Medline] [Abstract/Free Full Text] <h2> N/A N/A a#FIG1 .references, a#REF1 italic, bold none no (8) link
BioMed Central full-width with 1 margin yes repeated links in sidebar; major sections same page, reversible (including multiple occurrences) [PubMed Abstract][Publisher Full Text] .xpapertitle .subHead .subBHead none a#B1 italic, bold none no (141) link
PLOS full-width with 2 margins yes inline box after each section heading; major sections same page, not reversible Find this article online (leads to resolver page with PubMed/CrossRef/CSA) <h1> .classSection1, a#s1, <h3> <h4> .figure a#journal-pgen-0020015-b001 none XHTML 1.0 Transitional no (19) link

Notes:
[1] For the structural markup, '.word' indicates a class name and 'a#word' indicates an anchor.

Conclusions

For the page format results, pages scored badly where text was either full-width (pressed right up to the edges of the page) or very narrow (endless scrolling), both of which are difficult to read online. The choice of font face and size is sometimes not optimal either, but that's a question of style. Most articles contained a DOI, which allows readers to get back to the online source of a paper and to look up metadata when needed. Section links were all fairly sane, though often a bit over the top (I don't think I've ever used them to navigate between sections, but they often appear prominently throughout the article. Wiley's 'blind arrows' section navigation is particularly unhelpful). The article outline in Wolters (OVID) was probably the most useful. Outgoing links from references could definitely be improved - either by using a minimal, useful amount of links like BioMed Central, or by just providing one link to a local resolver page which would provide more options. It would be nice to see a bit of dynamic HTML, both for the in-text citations (which could expand to show the citation details inline) and for the external links from references (which could expand from a single link to show all the options on demand).

The semantic markup of pages was generally poor, but the best were Wolters (OVID; generally useful class name markup), PLOS (for using <h1>-<h4> properly) and BioMed Central (for putting class names on each element).

One thing I didn't mention was the presentation of figures, which has always been annoying. Opening a new window, which is always too small, for each image isn't a quick way to view figures. Science has the best method of displaying figures: it has good-sized thumbnails and caption inline, then loads a new HTML page in the same window, which has a reasonable sized version of the image—linked to the full size image itself—and a link to download the figure as a Powerpoint slide for use in teaching.

In summary, considering that these are professional publishers, a bit of effort put into making the fulltext HTML pages readable, valid and well structured could go a long way to making this publication format more useful.