Analysing the ticTOCs collection of journal TOC feeds

Academic publishers produce RSS feeds of articles in the latest issue of their journals. This Table of Contents (TOC) feed may be rich in metadata (mainly when they're published as RSS 1.0, which is RDF/XML), or a generic subscription format like RSS 2.0 or Atom.

The ticTOCs project is funded by JISC and aims to increase the quantity and quality of TOC feeds produced by publishers. Recently they made available the list of feed URLs that they've collected.

I fetched all the listed feeds (just over 13,000), converted them to RDF using rss2rdf.xsl (RSS 2.0), atom2rdf.xsl (Atom 0.3) or atom2rdf.xsl (Atom 1.0), then posted them to a Talis N2 store. From there, they can be queried using free text queries (with facets) or SPARQL. (Note: they're not actually all in there yet…).

I also did a little analysis:

Feed typeNumber of feeds
RSS 1.0 (RDF)5635
RSS 2.06420
Other RSS66
Atom6
Unknown/Empty74
Response codeNumber of feeds
200 OK11590
302 Moved temporarily298
0269
404 Not Found106
301 Moved permanently54
410 Gone12
500 Server error1
Content type (lower-cased)Number of feeds
text/xml3328
text/html; charset=utf-82372
application/rss+xml; charset=utf-82097
application/xml; charset=utf-81311
application/xml909
application/rdf+xml754
empty301
application/rss+xml295
text/html; charset=iso-8859-1384
text/xml; charset=utf-8249
text/html126
text/plain; charset=utf-861
text/xml; charset=iso-8859-155
text/plain51
text/html; charset=022
application/rdf+xml; charset=utf-811
application/xml; charset=iso-8859-13
Namespaces usedNumber of feeds
http://purl.org/dc/elements/1.1/7858
http://purl.org/rss/1.0/5655
http://www.w3.org/1999/02/22-rdf-syntax-ns#5632
http://prismstandard.org/namespaces/1.2/basic/4178
http://purl.org/rss/1.0/modules/syndication/2189
http://webns.net/mvcb/2147
http://purl.org/rss/1.0/modules/prism/950
http://xmlns.com/foaf/0.1/697
http://purl.org/rss/1.0/modules/content/423
http://www.w3.org/1999/xhtml212
http://web.resource.org/cc/209
http://www.biomedcentral.com/xml/schemas/extra/208
http://www.w3.org/2005/Atom90
http://www.w3.org/XML/1998/namespace10
http://rssnamespace.org/feedburner/ext/1.09
http://www.w3.org/2003/01/geo/wgs84_pos#1
http://www.w3.org/2000/01/rdf-schema#1
PRISM elements usedNumber of feeds
issn5100
volume4377
startingPage4315
number4131
endingPage4051
publicationName3840
coverDisplayDate2345
distributor2147
isPartOf2137
section1996
eIssn1977
publicationDate1547
copyright485
rightsAgent445
teaser49
issueName29
publisher24
issueIdentifier10
coverDate8
category1
Feed hosts using the PRISM 1.2 namespaceNumber of feeds
www.informaworld.com1391
www3.interscience.wiley.com1276
api.ingentaconnect.com756
www.inderscience.com243
www.nature.com100
www.biomedcentral.com62
www.palgrave-journals.com60
ej.iop.org49
www.rsc.org22
www.equinoxjournals.com16
www.edpsciences.org14
feeds.aps.org10
journals.iucr.org8
www.physmathcentral.com5
www.ipap.jp2
www.seer.furg.br2
pi.library.yorku.ca2
[160 others]1