Academic publishers produce RSS feeds of articles in the latest issue of their journals. This Table of Contents (TOC) feed may be rich in metadata (mainly when they're published as RSS 1.0, which is RDF/XML), or a generic subscription format like RSS 2.0 or Atom.
The ticTOCs project is funded by JISC and aims to increase the quantity and quality of TOC feeds produced by publishers. Recently they made available the list of feed URLs that they've collected.
I fetched all the listed feeds (just over 13,000), converted them to RDF using rss2rdf.xsl (RSS 2.0), atom2rdf.xsl (Atom 0.3) or atom2rdf.xsl (Atom 1.0), then posted them to a Talis N2 store. From there, they can be queried using free text queries (with facets) or SPARQL. (Note: they're not actually all in there yet…).
I also did a little analysis:
Feed type | Number of feeds |
RSS 1.0 (RDF) | 5635 |
RSS 2.0 | 6420 |
Other RSS | 66 |
Atom | 6 |
Unknown/Empty | 74 |
Response code | Number of feeds |
200 OK | 11590 |
302 Moved temporarily | 298 |
0 | 269 |
404 Not Found | 106 |
301 Moved permanently | 54 |
410 Gone | 12 |
500 Server error | 1 |
Content type (lower-cased) | Number of feeds |
text/xml | 3328 |
text/html; charset=utf-8 | 2372 |
application/rss+xml; charset=utf-8 | 2097 |
application/xml; charset=utf-8 | 1311 |
application/xml | 909 |
application/rdf+xml | 754 |
empty | 301 |
application/rss+xml | 295 |
text/html; charset=iso-8859-1 | 384 |
text/xml; charset=utf-8 | 249 |
text/html | 126 |
text/plain; charset=utf-8 | 61 |
text/xml; charset=iso-8859-1 | 55 |
text/plain | 51 |
text/html; charset=0 | 22 |
application/rdf+xml; charset=utf-8 | 11 |
application/xml; charset=iso-8859-1 | 3 |
Namespaces used | Number of feeds |
http://purl.org/dc/elements/1.1/ | 7858 |
http://purl.org/rss/1.0/ | 5655 |
http://www.w3.org/1999/02/22-rdf-syntax-ns# | 5632 |
http://prismstandard.org/namespaces/1.2/basic/ | 4178 |
http://purl.org/rss/1.0/modules/syndication/ | 2189 |
http://webns.net/mvcb/ | 2147 |
http://purl.org/rss/1.0/modules/prism/ | 950 |
http://xmlns.com/foaf/0.1/ | 697 |
http://purl.org/rss/1.0/modules/content/ | 423 |
http://www.w3.org/1999/xhtml | 212 |
http://web.resource.org/cc/ | 209 |
http://www.biomedcentral.com/xml/schemas/extra/ | 208 |
http://www.w3.org/2005/Atom | 90 |
http://www.w3.org/XML/1998/namespace | 10 |
http://rssnamespace.org/feedburner/ext/1.0 | 9 |
http://www.w3.org/2003/01/geo/wgs84_pos# | 1 |
http://www.w3.org/2000/01/rdf-schema# | 1 |
PRISM elements used | Number of feeds |
issn | 5100 |
volume | 4377 |
startingPage | 4315 |
number | 4131 |
endingPage | 4051 |
publicationName | 3840 |
coverDisplayDate | 2345 |
distributor | 2147 |
isPartOf | 2137 |
section | 1996 |
eIssn | 1977 |
publicationDate | 1547 |
copyright | 485 |
rightsAgent | 445 |
teaser | 49 |
issueName | 29 |
publisher | 24 |
issueIdentifier | 10 |
coverDate | 8 |
category | 1 |
Feed hosts using the PRISM 1.2 namespace | Number of feeds |
www.informaworld.com | 1391 |
www3.interscience.wiley.com | 1276 |
api.ingentaconnect.com | 756 |
www.inderscience.com | 243 |
www.nature.com | 100 |
www.biomedcentral.com | 62 |
www.palgrave-journals.com | 60 |
ej.iop.org | 49 |
www.rsc.org | 22 |
www.equinoxjournals.com | 16 |
www.edpsciences.org | 14 |
feeds.aps.org | 10 |
journals.iucr.org | 8 |
www.physmathcentral.com | 5 |
www.ipap.jp | 2 |
www.seer.furg.br | 2 |
pi.library.yorku.ca | 2 |
[160 others] | 1 |