Exploring PubChem via SPARQL

There's a triplestore of PubChem on bio2rdf.org, with a SPARQL interface, but no documentation or example queries.

Luckily, "RDF as self-describing data" suggests some queries for interrogating an RDF dataset. Here's the essence of that, refined for this particular use case:

  1. SELECT DISTINCT ?Concept ?g WHERE {GRAPH ?g {[] a ?Concept}}
    => <http://bio2rdf.org/ns/ns/ns/pubchem#Substance> | <http://bio2rdf.org/pubchem>
  2. SELECT ?x WHERE {?x a <http://bio2rdf.org/ns/ns/ns/pubchem#Substance>} LIMIT 5
    => <http://bio2rdf.org/pubchem:10007>
  3. SELECT ?p ?o WHERE {<http://bio2rdf.org/pubchem:10007> ?p ?o}
    lists all the properties of this substance, including <http://bio2rdf.org/ns/pubchem:InChI>
  4. SELECT ?x WHERE {?x <http://bio2rdf.org/ns/pubchem:InChI> "InChI=1/C19H24N2O4/c1-13(9-14-3-6-16(25-2)7-4-14)20-11-19(24)15-5-8-18(23)17(10-15)21-12-22/h3-8,10,12-13,19-20,23-24H,9,11H2,1-2H3,(H,21,22)/f/h21H"}
    is a query that finds this substance.

This substance is also available in HTML through the bio2rdf web interface, for browsing.

Looks like the data could do with re-importing: it doesn't have all the PubChem fields and some of the namespaces are a bit weird ("http://bio2rdf.org/ns/ns/ns/pubchem"?), but it's a good start.