|
Vol.
28 No. 3
May-June 2006
Defining
a Data Standard for Near-Infrared Spectroscopy and Chemometrics
Successful
long-term storage and retrieval of analytical data and the
more-advanced techniques of data mining and knowledge generation
are made possible through the deployment of well-documented,
internationally recognized standard data formats.1
At the end of 2001, a group of scientists with a history of
international collaboration met to discuss problems they were
encountering in exchanging near-infrared (NIR) spectroscopic
data. A more serious problem was the inability to move chemometric
data, including raw data and calibration models, among software
programs from different vendors and those arising out of various
research and development efforts. Also, although the 1988
Joint Committee on Atomic and Molecular Physical data—Data
eXchange (JCAMP-DX) standard2 had been adopted
piecemeal by near-infrared instrument manufacturers, there
was no data dictionary targeted specifically at the technological
needs of the NIR community.
In
response to these issues, a task group was formed and began
work in 2002 on gathering information about the broader needs
of the community. Several members of the task group had worked
together on a European food-research project called Quality
Established through Spectroscopic Techniques (QUEST). The
QUEST team had sought to tackle the problem of a lack of standardization
in the fields of NIR and chemometrics by developing their
own project standards, providing a good knowledge base on
which future efforts could build. This IUPAC project would
use this knowledge base as a starting point, but the solutions
that it created would need to be of broader scope, covering
a wider range of instrumentation types than that deployed
in the food and beverage arena.3,4
The
NIR and Chemometrics Data Exchange Standards group meeting
was attended by (from left to right) Rasmus Bro (KVL,
Denmark), Mohamed Hanafi (ENITIAA/INRA, France), Douglas
Rutledge (INA P-G, France), Tony Davies (Waters Informatics,
Germany), Gerard Downey (Teagasc, Ireland), Jeremy Shaver
(Eigenvector Research, U.S.) and Ian Cowe (FOSS NIR Systems,
Sweden).
Most of the initial IUPAC work progressed slowly and was conducted
electronically, but in light of the fact that the members
of the group are all very active in industry, academia, and
government laboratories, it was eventually concluded that
a face-to-face meeting would be beneficial. There were several
open issues that needed clarification, and, although the NIR
work and the chemometrics work had separate objectives, with
distinct timelines, having the entire task group work in both
areas simultaneously had been problematic because of the different
knowledge required for the two efforts. A meeting was thus
called in January 2006 in Dublin, Ireland, with the aim of
addressing these issues, getting the project back on track,
and exploring the possibility of restructuring the task group
into two parallel action groups corresponding to the two separate
objectives.
It was particularly important to get the group moving again
because the two recommendations it would be generating would
be required for inclusion as the standard data dictionaries
for Phase 2 and Phase 3 of the work on the new XML Analytical
Information Markup Language (AnIML) data standards being conducted
jointly with American Society for Testing and Materials (ASTM)
International Subcommittee E13.15 <http://animl.sourceforge.net>.
The meeting in Dublin brought together a good mix of instrument
vendors, end users, third-party software providers, and academics.
Bones
of Contention
The meeting began by bringing the participants up to speed
on the work being conducted, including a review of the efforts
of the ASTM E13.15 Subcommittee, which hadn’t been calculated
in when the initial project proposal had been drawn up. Extensive
constructive debate concerning exactly what information should
be addressed by the chemometrics standard cleared up a number
of issues that had been slowing progress. One specific issue
discussed was the proposed capability of vendor software to
export calibration models within the exchange format.
There are major business issues associated with this capability,
particularly in the food and agriculture analysis field. The
generation and distribution of just these types of calibration
models is a major source of revenue for instrument vendors,
with thousands of copies of such software sold each year.
If the capability to freely distribute these models were built
into the instrument software, allowing the models to be exported
in an open-standard format, it could undermine, if not eliminate,
the essential and profitable development work conducted to
produce such models. However, academics developing new chemometric
methods wish for exactly this functionality in order to document
their activities and compare and contrast them with those
of colleagues and the wider scientific audience.
One proposed response to these issues draws on the solution
to similar problems faced by vendors of reference spectroscopic
databases. In this case, users often want to enhance the software
by including and exporting their own reference data. The software
packages have this capability and can differentiate between
copyrighted vendor databases—which cannot be exported—and
user-generated databases—which can. Adopting this solution
would mean that vendor-supplied, commercially sensitive chemometrics
models would receive the same type of protection, while users
would be free to export calibration models that they generated
themselves in the new IUPAC/JCAMP-DX chemometrics data file
format.
Process
Analytical Technology
The need to document chemometrics data in a long-term, stable,
vendor-neutral format will steadily increase in the future.
This is particularly true in light of the wider adoption of
process analytical technologies in regulated industries, as
highlighted by the U.S. Food and Drug Administration’s
efforts to actively promote such technologies within the pharmaceutical
sector. This risk-based approach to pharmaceutical batch release
essentially envisages the software packages using data obtained
from the manufacturing plant to make the majority of decisions
concerning the release of a particular batch to market.
The
need to document chemometrics data in a long-term, stable,
vendor-neutral format will steadily increase in the
future.
|
This
is a major departure from the current practice, under which
a quality assurance chemist must sign a release certificate
following a series of lab tests. It is therefore essential
that the models on which the software bases its decisions
are available for scrutiny at all times and well into the
future, long after a particular product, software package,
or installation has been decommissioned. Essentially these
models, and the data fed into them to generate a decision,
will fall under the same Good Manufacturing Practice predicate
rules and 21 CFR Part 11 <www.21cfrpart11.com>
electronic-records and electronic-signature rules as do the
current analytical results and documentation within the quality
assurance environment.
Education
and e-Learning
In recent years major steps have been taken to integrate e-learning
tools into normal curricula. In the chemometrics field, teachers
and trainers have been developing courses with content that
includes example calibration data files and the resulting
models. The current state of the art is such that e-learning
material often needs to be re-worked for each of the various
third-party software solutions and instrument-vendor packages.
When a standard format finally becomes available, it will
greatly ease this burden, allowing trainers to post e-learning
materials in the standard format for the trainees to download
and install on their own systems, regardless of which chemometrics
product they have standardized on.5
NIR
File Format Standardization
A number of NIR data files in IUPAC/JCAMP-DX format were examined
for compliance with the existing standards and found to require
only relatively minor changes for compatibility. Participants
also discussed what additions need to be made to the data
dictionaries already available from prior JCAMP-DX standards.
A draft recommendation is now being prepared. A second draft
on chemometrics will also include a comprehensive list of
the various pre- and post-processing algorithms commonly used
in the field. A Web site has also been created to help broaden
the discussion <www.jcamp.org>.
An
Appeal
As with all such standards-development processes, the task
group relies heavily on input from the scientific community
and would very much encourage readers to follow the work as
it progresses and to contact them with constructive ideas
to improve and perhaps speed up the development of these two
important recommendations.
References
1. R.J. Lancashire and A.N.
Davies, The Quest for a Universal Spectroscopic Data Format.
Chemistry International,
28(1), pp. 10–12, 2006.
2. R.S. McDonald and P.A. Wilks
Jr., JCAMP-DX: A Standard Form for the Exchange of Infrared
Spectra in Computer Readable Form, Appl. Spectrosc.,
42(1), pp. 151–162, 1988.
3. A.N. Davies, A Pilot Study of the QUEST
Spectral Database, in Food Spectroscopy Progress in Spectral
Transfer and Database Development, ISBN: 0 9523455 4
4, p. 26, 1994.
4. A.N. Davies, The QUEST for Food Quality
Control, Spectroscopy Europe, 4(3), pp. 27–28,
1992.
5. P. Lampen and A.N. Davies, JCAMP-DX
to ORIGIN Utility Tools for Making Spectra Available to Chemometricians,
Spectroscopy Europe, 16(5), pp. 28-30, 2004.
This
article was authored by Gerard Downey (TEAGASC, Dublin, Ireland),
Douglas Rutledge (Institut National de la Recherche Agronomique,
Paris-Grignon, France), Peter Lampen (Institute for Analytical
Science, Dortmund, Germany), and Tony Davies (Waters Informatics,
Frechen, Germany). For more information, contact Tony Davies
<[email protected]>,
chairman of the Subcommittee on Electronic Data Standards.
www.iupac.org/projects/2002/2002-020-2-024.html
Page
last modified 25 April 2007.
Copyright © 2003-2007 International Union of Pure and
Applied Chemistry.
Questions regarding the website, please contact [email protected]
|