Generating Standard Chemical Identifiers (Standard InChI)

Updated for version 1.03 (where standard and non-standard InChI generation have been combined into one binary).

InChI is the International Chemical Identifier - a notation that allows a chemical to be represented as a string.

While it's been around for a few years, there was a problem with using InChI to identify chemicals: the code used to generate the InChI string allowed options to be provided which would alter the output InChI, so you couldn't search across databases for a chemical using a single InChI string.

Hence, Standard InChI: the same thing but with a standard, immutable set of characteristics that will always produce the same InChI string for a given molecule. Standard InChIs are designated by the prefix "InChI=1S/".

Code for generating Standard InChIs can be downloaded from the IUPAC site. You want INCHI-1-API.zip.

To generate the inchi-1 executable:

#!/bin/bash
wget 'http://www.iupac.org/inchi/download/version1.03/INCHI-1-API.zip'
unzip INCHI-1-API.zip
cd INCHI-1-API/INCHI/gcc/inchi-1
make
sudo cp inchi-1 /usr/local/bin/

You can test it with a chemical structure file from Nature Chemical Biology, if you have OpenBabel installed:

#!/bin/bash
wget 'http://www.nature.com/nchembio/journal/v5/n1/chemdraw/nchembio.133-comp1.cdx'
babel -icdx nchembio.133-comp1.cdx -osdf chem.sdf # convert CDX file to SDF
inchi-1 chem.sdf

Which generates a standard InChI: InChI=1S/C6H9N3O3/c7-4(5(10)11)1-3-2-8-6(12)9-3/h2,4H,1,7H2,(H,10,11)(H2,8,9,12)/t4-/m0/s1

2-amino-3-(2-oxo-1,3-dihydroimidazol-4-yl)propanoic acid

Most databases don't yet have Standard InChIs indexed for all their articles, but soon that Standard InChI should show up in PubChem, ChemSpider and elsewhere.