Stephanie Haustein and colleagues recently described the lack of correlation between tweets about an article (using Altmetric data from July 2011 - December 2012) and formal citations of the article. They investigated 1.4 million articles indexed in both PubMed and Web of Science.
I decided to look at the data for smaller sets of articles, published in specific journals. After writing a few scripts to fetch and parse data to CSV from various web services, using the DOI as the key for each row, I realised that it would be easier to gather the data in OpenRefine by incrementally adding columns.
Launch the OpenRefine application and open the web interface. Create a new project Enter the URL of some CSV data (e.g. the output of a Solr query), or choose a local CSV file (e.g. the result of a SPARQL query). Check the settings (UTF-8 encoding, comma separator, ignore blank rows) > create project "doi" column > Edit column > Add column by fetching URLs ("altmetric_json") > GREL: "altmetric_json" column > Edit column > Add column based on this column ("tweets") > GREL: "doi" column > Edit column > Add column by fetching URLs ("crossref_xml") > GREL: "crossref_xml" column > Edit column > Add column based on this column ("citations") > GREL: Import a CSV file, with columns "doi" and "citations", to a new project named "citations_scopus". "doi" column of original project > Edit column > Add column based on this column > GREL: "tweets" column > Facet > Scatterplot facet Export > Custom Tabular Exporter > unselect JSON and XML columns > Download One limitation of the data used here is that the dates of each tweet and citation are not known; it might be interesting to correlate tweets and citations during specific windows of time after article publication. Note: I've realised that there's missing data for up to half of the articles, so it would be incorrect to draw any conclusions yet.Start OpenRefine
Import a CSV file
Fetch JSON data from a URL
'http://api.altmetric.com/v1/doi/' + value.escape('url')
Parse JSON data to a new column
value.parseJson().cited_by_tweeters_count
Fetch XML data from a URL
'http://www.crossref.org/openurl/?noredirect=true&pid=' + 'YOUR_CROSSREF_PID'.escape('url') + '&id=doi:' + value.escape('url')
Parse XML data to a new column
value.parseHtml().select('query')[0].htmlAttr('fl_count').toNumber()
Merge data from another project
cell.cross("citations_scopus", "doi").cells["citations"].value[0]
Scatterplot facet
Export to CSV
Import to Google Spreadsheets