An automated method for peak detection and matching in large gas chromatography‐mass spectrometry data sets

A new approach for peak detection and matching has been developed and applied to two data sets. The first consisted of the Gas Chromatography‐Mass Spectrometry (GC‐MS) samples of 965 human sweat samples obtained from a population of 197 individuals. The second data set contained 500 synthetic chromatograms, and was generated to validate the peak detection and matching methods. The size of both of the data sets (around 500 000 detectable peaks over all chromatograms in data set 1, and around 100 000 in data set 2) would make it unfeasible to check manually whether peaks are matched. In the method described, the first procedure involves pre‐processing the data before carrying out the second procedure of peak detection. The final procedure of peak matching consists of three stages: (a) finding potential target peaks in the full data set over all chromatograms; (b) matching peaks in the chromatograms to these targets to form clusters of spectra associated with each target; (c) merging targets where appropriate. Peak detection and matching were applied to both data sets, and the importance of stage (c) of peak matching described. In addition to the analysis of the synthetic chromatograms, the method was also validated by shuffling the original order of the sweat chromatograms and performing the methods independently on the newly shuffled data. Copyright © 2007 John Wiley & Sons, Ltd.

[1]  Helena Idborg,et al.  Multivariate approaches for efficient detection of potential metabolites from liquid chromatography/mass spectrometry data. , 2004, Rapid communications in mass spectrometry : RCM.

[2]  Component detection weighted index of analogy: similarity recognition on liquid chromatographic mass spectral data for the characterization of route/process specific impurities in pharmaceutical tablets. , 2005, Analytical chemistry.

[3]  Ralf J. O. Torgrip,et al.  Peak alignment using reduced set mapping , 2003 .

[4]  Olav M. Kvalheim,et al.  Automated curve resolution applied to data from multi-detection instruments , 2001 .

[5]  P. A. Taylor,et al.  Synchronization of batch trajectories using dynamic time warping , 1998 .

[6]  Fred W. McLafferty,et al.  Probability based matching system using a large collection of reference mass spectra , 1976 .

[7]  S. Mishra,et al.  Determination of methylmercury and mercury(II) in a marine ecosystem using solid-phase microextraction gas chromatography-mass spectrometry , 2005 .

[8]  Yukihiro Ozaki,et al.  SELF-MODELING CURVE RESOLUTION (SMCR): PRINCIPLES, TECHNIQUES, AND APPLICATIONS , 2002 .

[9]  Mariusz Kowalczyk,et al.  A strategy for identifying differences in large series of metabolomic samples analyzed by GC/MS. , 2004, Analytical chemistry.

[10]  R. Brereton,et al.  Analysis of badger urine volatiles using gas chromatography-mass spectrometry and pattern recognition techniques. , 2001, The Analyst.

[11]  O. Andersen,et al.  Chromatographic preprocessing of GC-MS data for analysis of complex chemical mixtures. , 2005, Journal of chromatography. A.

[12]  Jian Yang,et al.  Metabolomics spectral formatting, alignment and conversion tools (MSFACTs) , 2003, Bioinform..

[13]  Kristin H. Jarman,et al.  A new approach to automated peak detection , 2003 .

[14]  Fan Gong,et al.  Data Preprocessing for Chromatographic Fingerprint of Herbal Medicine with Chemometric Approaches , 2005 .

[15]  R. Koppmann,et al.  A new mathematical procedure to evaluate peaks in complex chromatograms. , 2005, Journal of chromatography. A.

[16]  Stephen J. Bruce,et al.  Extraction, interpretation and validation of information for comparing samples in metabolic LC/MS data sets. , 2005, The Analyst.

[17]  D L Massart,et al.  Automatic program for peak detection and deconvolution of multi-overlapped chromatographic signals part I: peak detection. , 2005, Journal of chromatography. A.

[18]  Á. Garrido-López,et al.  Determination of volatile oak compounds in wine by headspace solid-phase microextraction and gas chromatography-mass spectrometry. , 2006, Journal of chromatography. A.

[19]  L. Burkhard,et al.  A Simple Comparison of Mass Spectral Search Results and Implications for Environmental Screening Analyses , 1999, Archives of environmental contamination and toxicology.

[20]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[21]  Ingvar Eide,et al.  Toxicological evaluation of complex mixtures: fingerprinting and multivariate analysis. , 2004, Environmental toxicology and pharmacology.

[22]  Dustin J Penn,et al.  In situ surface sampling of biological objects and preconcentration of their volatiles for chromatographic analysis. , 2006, Analytical chemistry.

[23]  Johan Trygg,et al.  High-throughput data analysis for detecting and identifying differences between samples in GC/MS-based metabolomic analyses. , 2005, Analytical chemistry.

[24]  P. Sandra,et al.  Sorptive sample preparation – a review , 2002, Analytical and bioanalytical chemistry.

[25]  Richard G. Brereton,et al.  Discrimination between tablet production methods using pyrolysis-gas chromatography-mass spectrometry and pattern recognition. , 2003, The Analyst.

[26]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[27]  R. Brereton,et al.  Determination of cocaine contamination on banknotes using tandem mass spectrometry and pattern recognition , 2006 .

[28]  W. Windig,et al.  A Noise and Background Reduction Method for Component Detection in Liquid Chromatography/Mass Spectrometry , 1996 .

[29]  R. Andersson,et al.  Simplex focusing of retention times and latent variable projections of chromatographic profiles , 1994 .

[30]  S. Stein An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data , 1999 .

[31]  T. Kuhara Gas chromatographic-mass spectrometric urinary metabolome analysis to study mutations of inborn errors of metabolism. , 2005, Mass spectrometry reviews.

[32]  C. A. Hastings,et al.  New algorithms for processing and peak detection in liquid chromatography/mass spectrometry data. , 2002, Rapid communications in mass spectrometry : RCM.

[33]  J. Carstensen,et al.  Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping , 1998 .

[34]  Richard G. Brereton,et al.  Chemometrics: Data Analysis for the Laboratory and Chemical Plant , 2003 .

[35]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[36]  R. Synovec,et al.  Objective data alignment and chemometric analysis of comprehensive two-dimensional separations with run-to-run peak shifting on both dimensions. , 2001, Analytical chemistry.