Updated for version 1.03 (where standard and non-standard InChI generation have been combined into one binary).
InChI is the International Chemical Identifier - a notation that allows a chemical to be represented as a string.
While it's been around for a few years, there was a problem with using InChI to identify chemicals: the code used to generate the InChI string allowed options to be provided which would alter the output InChI, so you couldn't search across databases for a chemical using a single InChI string.
Hence, Standard InChI: the same thing but with a standard, immutable set of characteristics that will always produce the same InChI string for a given molecule. Standard InChIs are designated by the prefix "InChI=1S/".
Code for generating Standard InChIs can be downloaded from the IUPAC site. You want INCHI-1-API.zip.
To generate the inchi-1 executable:
#!/bin/bash wget 'http://www.iupac.org/inchi/download/version1.03/INCHI-1-API.zip' unzip INCHI-1-API.zip cd INCHI-1-API/INCHI/gcc/inchi-1 make sudo cp inchi-1 /usr/local/bin/
You can test it with a chemical structure file from Nature Chemical Biology, if you have OpenBabel installed:
#!/bin/bash wget 'http://www.nature.com/nchembio/journal/v5/n1/chemdraw/nchembio.133-comp1.cdx' babel -icdx nchembio.133-comp1.cdx -osdf chem.sdf # convert CDX file to SDF inchi-1 chem.sdf
Which generates a standard InChI: InChI=1S/C6H9N3O3/c7-4(5(10)11)1-3-2-8-6(12)9-3/h2,4H,1,7H2,(H,10,11)(H2,8,9,12)/t4-/m0/s1
Most databases don't yet have Standard InChIs indexed for all their articles, but soon that Standard InChI should show up in PubChem, ChemSpider and elsewhere.