I  U  P  A  C






News & Notices

Organizations & People

Standing Committees

Divisions

Projects
..current
..completed
..new
..information

Reports

Publications

Symposia

AMP

Links of Interest

Search the Site

Home Page

 

 

Project

Chemical Nomenclature and Structure Representation Division (VIII)

 

Number: 2000-025-1-800

 

Title: IUPAC- International Chemical Identifier

Task Group
Chairman:
A. McNaught

Members: S. Heller and S. Stein

Remarks:
- initiated by the ad hoc Committee on Chemical Identity and Nomenclature Systems
- In July 2004, the Identifier was renamed INChI (formerly IChI) to acknowledge the development work at NIST.
- In November 2004, the Identifier was renamed IUPAC International
Chemical Identifier (InChI), to allow trademark, copyright and licensing
issues to be resolved.

Completion Date: 2005 - project completed

Objective:
The objective of the IUPAC Chemical Identifier Project is to establish a unique label, the IUPAC Chemical Identifier, which would be a non-proprietary identifier for chemical substances that could be used in printed and electronic data sources thus enabling easier linking of diverse data compilations.

Description:
Develop a set of algorithms for the standard representation of chemical structures that will be readily accessible to chemists in all countries at no cost. The standard chemical representation could be used as input into existing and newly developed computer programs to generate a IUPAC name and a unique IUPAC identifier.

> See detailed description

Progress:
Our initial work has focussed on the development of algorithms for converting an input organic chemical structure to a unique (canonical) form. This, in effect, involves the unique numbering of each atom, with equivalent atoms being assigned identical numbers. "Serializing" the result to create a string is the final, straightforward, step in creating an identifier.

As discussed in the Cambridge IUPAC meeting to consider the feasibility of the project in August 2000, most of the ideas employed in this work have been reported in the technical literature. The principal task of this project has been to identify and implement a workable, robust set of procedures that will provide effective IChI processing for a large proportion of organic chemical structures in common use.

At the Cambridge meeting it was agreed to develop a "layered" approach, where different levels of structural information are separately represented in the identifier. Work has consequently proceeded by step-by-step building of the individual layers. Since the order of application of the layers could affect the final labeling, this process is somewhat more complex that might initially appear.

The layers under development are:

  1. constitutional - expresses pure connectivity of the atoms
  2. stereochemical - includes conventional C-atom sp2 and sp3 stereochemistry
  3. isotopic - enables isotopes to be distinguished
  4. tautomeric - implements simple forms of rapid H-migration isomerization

Initial implementation and testing of this work have been completed, with the exception of the following two items:

  1. Representing stereochemistry in systems with moving (mesomeric) bonds and electrons.
  2. Representing H-migration tautomerism in systems containing 5-membered rings.

The first of these items does not seem to have been addressed adequately in the literature, although appropriate processing algorithms have been found in mathematical journals.

We hope to complete these remaining tasks within two months and then to implement the IChI processor as a standalone program that can automatically process standard "MOL-files". When this is available, assistance will be sought to further test, and possibly refine the IChI name generation process.

Depending on results of these tests and discussions, it will be decided whether improvements or additional features are desirable, and, if so, whether these need to be followed by another round of testing. For instance, it needs to be determined whether the first version should allow a canonical representation of partially-specified stereochemical structures.

Finally, as discussed in the Cambridge meeting, there are no plans to include the following structural representations in the first version:

  1. non C-atom sp2 and sp3 stereochemistry
  2. ring-chain tautomerism (or any other variety not involving simple H-migration)
  3. non-covalent bonds

March 2002 update
The first beta-test version of the program is now available. It runs as a conventional Windows application under 32-bit Microsoft Windows operating systems. Neither the underlying algorithms nor the program have been perfected - this distribution is intended primarily to allow others to participate in the further development.

This program treats only covalently bonded compounds and uses Molfiles (and SDfiles) as input. Along with the executable programs, the distribution package contains documentation and example structure files.

The package can be obtained from Steve Stein by e-mail to [email protected]. Unless requested otherwise, the package will be delivered as a 'zip' file in an e-mail attachment to the return address.

A demonstration of Identifier generation within a (Windows) structure-drawing program, working in conjunction with the beta test program, can be obtained from Alan McNaught by e-mail to [email protected].

There was a discussion of the project at the "CAS/IUPAC Conference on Chemical Identifiers and XML for Chemistry" on July 1, 2002 in Columbus, Ohio. On the preceding day (June 30th) at the same location the Project Group met to review progress and consider comments received.

July 2002 update
At the Task group meeting in Columbus, OH, on 30 June 2002, Steve Stein reviewed the progress made by NIST in developing the test version of the IUPAC Chemical Identifier. The test version handles simple organic molecules. To date, in all of the testing (almost 70 copies have been distributed) there are no known examples of chemicals that the program does not handle. A number of suggestions (described below) were made regarding testing and output. The overall view was that the project is progressing considerably faster than expected.
> Download report - pdf file (118 KB)

A lecture by Steve Stein on the project was given the following day at the CAS/IUPAC Conference on Chemical identifiers and XML for Chemistry and a copy of the slides presented can be viewed at: http://www.hellers.com/steve/pub-talks/columbus-702/frame.htm

November 2003 update
A combined meeting for two related IUPAC projects, the XML Data Dictionary Project (#2002-022-1-024) and this Chemical Identifier Project (#2000-025-1-800), was held at the National Institute of Standards and Technology (NIST, Gaithersburg, Maryland, US) on November 12-14, 2003.

A report on that meeting is published in Chem. Int. July-Aug 2004.
A full account of the meeting is available at <www.warr.com/inchi.pdf>

July 2004 update
A new test version of the IUPAC-NIST Chemical Identifier (INChI) is now available. It replaces the previous test version issued last November. All features planned for inclusion in the final release have now been implemented and the final format for Identifier has been proposed. The new name of the Identifier (formerly IUPAC Chemical Identifier, INChI) acknowledges the development work at NIST. The test program accepts input in the form of MOLFiles (or SDfiles) and CML files. An Application Program Interface (API) for communicating with external programs is under development.

A single INChI is generated for a single input structure, which can contain multiple components. Identifiers can be created for organic compounds with Z/E and sp3 stereochemistry, tautomers, and isotopes as well as salts, organometallic compounds and protonated forms of a compound.

Test programs (for Microsoft Windows), documentation and sample structure files are available upon request from Steve Stein <[email protected]>. The project team very much welcomes comments concerning the INChI and will be glad to assist in its testing or implementation.

November 2004 update
To allow trademark copyright and licensing issues to be resolved before distribution of version 1.0, the name of the Identifier was changed to IUPAC International Chemical Identifier (InChI).

April 2005 - project completed
Version 1 of IUPAC's International Chemical Identifier (InChI) has now been released; software, documentation, source code and licensing conditions are available from the IUPAC website at www.iupac.org/inchi.

Promotion and extension continue through project 2004-039-1-800.

> see release; > FAQ (prepared by Nick Day of the Unilever Centre for Molecular Informatics, Cambridge University; http://wwmm.ch.cam.ac.uk/inchifaq/)


Clipping
> That INChI Feeling Reactive Reports, Sep 2004 (issue 40)
> Unique labels for compounds C&EN, 26 Nov 2002
> That ICHI feeling ... The Alchemist, 24 Apr 2002
> What's in a Name? The Alchemist, 21 Mar 2002

Last Update: 14 April 2005

<project announcement published in Chem. Int. 23(3) 2001>

If you want to update this information, contact us by e-mail
Do not forget to include the Project Number,
your name and relation with that project


Page last modified 2 July 2007.
Copyright ©1997-2007 International Union of Pure and Applied Chemistry.
Questions or comments about IUPAC, please contact, the Secretariat.
Questions regarding the website, please contact web manager.