Introduction
While PDFs of scientific papers are the historical descendants of print-based publication, most journals also make the full text of papers available online in HTML format. These pages contain hypertext links, thumbnails of figures and tables and reference lists that are linked to external resources. Some people prefer to read and store papers in this format—personally I've always used PDF, as the HTML pages tended to be badly presented. Web page layouts have been improving though, so I carried out a survey of HTML fulltext pages from the major publishers with the aim of identifying a) problems with usability and b) recurring themes in semantic markup of article elements (which will hopefully lend itself to a microformat recommendation for scientific, or at least biomedical, articles).
The features studied were:
- The width of the page layout (often the column can be too narrow to read, especially if the text isn't justified).
- The presence of a visible DOI for the article.
- The use of an article outline or links to major section headings.
- The method of linking from in-text citations to the full details of each reference. Usually this scrolls the page to the reference list at the end, so it's useful to have a link back to the citation (though the browser's Back button can often do this too).
- Presentation of outgoing links from each item in the reference list, used to access cited papers.
- HTML markup used for the article title, main sections (Abstract, Introduction, Materials and Methods, Results, Discussion, Acknowledgements, References), minor sections, figures, individual references and parts of each reference (author, journal name, year, volume, page number etc).
- The DOCTYPE for the document.
- Whether the document syntax was valid (according to the W3C Markup Validation Service) - a low number of errors (<10) is excusable.
Results
good things are marked in green; bad things are marked in red; useful markup is marked in yellow.
Publisher | Page format | Visible DOI | Section links | In-article citation links | Outgoing reference links | Markup for title [1] | Markup for main sections | Markup for minor sections | Markup for figures | Markup for each reference | Sub-reference markup | DOCTYPE | Valid (number of errors) | Fulltext URL |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Wiley | framed | yes | In sidebar and arrows under each major section heading; major sections | same page, not reversible | Internal resolver page (ChemPort, PubMed), DOI | .articleTitle | .firstLevelHeading, a#SEC1-1 | .firstLevelHeading, a#SEC2-1 | a#FIG1 | .referenceNum, a#BIB1 | italic, bold | XHTML 1.0 Transitional | no (4) | link |
NPG | narrow | yes | in sidebar; major sections | same page, not reversible | Article (DOI), PubMed, ISI, ChemPort | .articletitle | a#Materials_and_methods, .articletype | <strong> | a#fig1 | a#bib1 | .journal, .jnumber | none | no (8) | link |
Highwire | full-width | no | inline box after each section heading; major sections | same page, not reversible | [Abstract/Free Full Text] or [CrossRef][Medline] | none | a#BDY, a#SEC1, a#BIBL | <em> | a#F1 | a#R1 | italic, bold | none | no (97) | link |
Elsevier (ScienceDirect) | narrow | yes | inline Article Outline; major and minor sections | same page, reversible | Abstract-Elsevier BIOBASE | Abstract-EMBASE | Abstract-MEDLINE | $Order Document | Full Text via CrossRef | Abstract + References in Scopus | Cited By in Scopus | <p> inside <h2> | <h2> | <h3> | a#FIG1 | a#bib1 | italic | none | no (754) | link |
Nature | narrow | yes | in sidebar; major sections | same page, not reversible | Article|PubMed|ISI|ChemPort | <h2> | .heading1 | <b> | a#F1 | a#B1 | italic, bold | none | no (184) | link |
Blackwell (Synergy) | narrow | yes | in drop-down menu next to each section heading | use Javascript to open in new window | SYNERGY Abstract CrossRef Abstract MEDLINE Abstract ISI Abstract CSA Abstract; image links, use Javascript to open in new window | .abstracttitle | a#h6,12,17,18,19 | .heading2 | f1, javascripted new window | a#b1 | italic, bold | HTML 4.01 Transitional | no (51) | link |
Wolters (OVID) | full-width | no | outline in sidebar; major and minor sections | same page, reversible | BIOSIS, MEDLINE (both just text records) | .fulltext-TITLE | .fulltext-LEVEL1 | .fulltext-LEVEL2 | .fulltext-graphic | .fulltext-REFERENCES | italic | HTML 4.01 Strict | no (111) | (expires) |
AAAS | medium-width | yes | N/A | same page, not reversible | [CrossRef] [ISI] [Medline] [Abstract/Free Full Text] | <h2> | N/A | N/A | a#FIG1 | .references, a#REF1 | italic, bold | none | no (8) | link |
BioMed Central | full-width with 1 margin | yes | repeated links in sidebar; major sections | same page, reversible (including multiple occurrences) | [PubMed Abstract][Publisher Full Text] | .xpapertitle | .subHead | .subBHead | none | a#B1 | italic, bold | none | no (141) | link |
PLOS | full-width with 2 margins | yes | inline box after each section heading; major sections | same page, not reversible | Find this article online (leads to resolver page with PubMed/CrossRef/CSA) | <h1> | .classSection1, a#s1, <h3> | <h4> | .figure | a#journal-pgen-0020015-b001 | none | XHTML 1.0 Transitional | no (19) | link |
Notes:
[1] For the structural markup,
'.word' indicates a class name and
'a#word'
indicates an anchor.
Conclusions
For the page format results, pages scored badly where text was either full-width (pressed right up to the edges of the page) or very narrow (endless scrolling), both of which are difficult to read online. The choice of font face and size is sometimes not optimal either, but that's a question of style. Most articles contained a DOI, which allows readers to get back to the online source of a paper and to look up metadata when needed. Section links were all fairly sane, though often a bit over the top (I don't think I've ever used them to navigate between sections, but they often appear prominently throughout the article. Wiley's 'blind arrows' section navigation is particularly unhelpful). The article outline in Wolters (OVID) was probably the most useful. Outgoing links from references could definitely be improved - either by using a minimal, useful amount of links like BioMed Central, or by just providing one link to a local resolver page which would provide more options. It would be nice to see a bit of dynamic HTML, both for the in-text citations (which could expand to show the citation details inline) and for the external links from references (which could expand from a single link to show all the options on demand).
The semantic markup of pages was generally poor, but the best were Wolters (OVID; generally useful class name markup), PLOS (for using <h1>-<h4> properly) and BioMed Central (for putting class names on each element).
One thing I didn't mention was the presentation of figures, which has always been annoying. Opening a new window, which is always too small, for each image isn't a quick way to view figures. Science has the best method of displaying figures: it has good-sized thumbnails and caption inline, then loads a new HTML page in the same window, which has a reasonable sized version of the image—linked to the full size image itself—and a link to download the figure as a Powerpoint slide for use in teaching.
In summary, considering that these are professional publishers, a bit of effort put into making the fulltext HTML pages readable, valid and well structured could go a long way to making this publication format more useful.