GC/MS based metabolomics: development of a data mining system for metabolite identification by using soft independent modeling of class analogy (SIMCA)

BackgroundThe goal of metabolomics analyses is a comprehensive and systematic understanding of all metabolites in biological samples. Many useful platforms have been developed to achieve this goal. Gas chromatography coupled to mass spectrometry (GC/MS) is a well-established analytical method in metabolomics study, and 200 to 500 peaks are routinely observed with one biological sample. However, only ~100 metabolites can be identified, and the remaining peaks are left as "unknowns".ResultWe present an algorithm that acquires more extensive metabolite information. Pearson's product-moment correlation coefficient and the Soft Independent Modeling of Class Analogy (SIMCA) method were combined to automatically identify and annotate unknown peaks, which tend to be missed in routine studies that employ manual processing.ConclusionsOur data mining system can offer a wealth of metabolite information quickly and easily, and it provides new insights, particularly into food quality evaluation and prediction.

[1]  Svante Wold,et al.  Pattern recognition by means of disjoint principal components models , 1976, Pattern Recognit..

[2]  Donald R. Scott,et al.  Preprocessing, variable selection, and classification rules in the application of SIMCA pattern recognition to mass-spectral data , 1989 .

[3]  O. Fiehn,et al.  Metabolite profiling for plant functional genomics , 2000, Nature Biotechnology.

[4]  Masanori Arita,et al.  Comparison of ESI-MS Spectra in MassBank Database , 2008, 2008 International Conference on BioMedical Engineering and Informatics.

[5]  Dietmar Schomburg,et al.  MetaboliteDetector: comprehensive analysis tool for targeted and nontargeted GC/MS based metabolome analysis. , 2009, Analytical chemistry.

[6]  Alexander Erban,et al.  TagFinder for the quantitative analysis of gas chromatography - mass spectrometry (GC-MS)-based metabolite profiling experiments , 2008, Bioinform..

[7]  M. Nielen,et al.  An untargeted metabolomics approach to contaminant analysis: pinpointing potential unknown compounds. , 2007, Analytica chimica acta.

[8]  J. Rabinowitz,et al.  Absolute Metabolite Concentrations and Implied Enzyme Active Site Occupancy in Escherichia coli , 2009, Nature chemical biology.

[9]  Eiichiro Fukusaki,et al.  Prediction of Japanese green tea ranking by gas chromatography/mass spectrometry-based hydrophilic metabolite fingerprinting. , 2007, Journal of agricultural and food chemistry.

[10]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[11]  Oliver Fiehn,et al.  Extending the breadth of metabolite profiling by gas chromatography coupled to mass spectrometry. , 2008, Trends in analytical chemistry : TRAC.

[12]  Donald R. Scott Classification of binary mass spectra of toxic compounds with an inductive expert system and comparison with SIMCA class modeling , 1988 .

[13]  Dieter Jahn,et al.  MetaQuant: a tool for the automatic quantification of GC/MS-based metabolome data , 2006, Bioinform..

[14]  E. Etxeberria,et al.  Metabolomic analysis in food science: a review , 2009 .

[15]  Yury Tikunov,et al.  A Novel Approach for Nontargeted Data Analysis for Metabolomics. Large-Scale Profiling of Tomato Fruit Volatiles1[w] , 2005, Plant Physiology.

[16]  Daniel Eriksson,et al.  Data integration in plant biology: the O2PLS method for combined modeling of transcript and metabolite data. , 2007, The Plant journal : for cell and molecular biology.

[17]  O. Fiehn,et al.  Mass spectrometry-based metabolic profiling reveals different metabolite patterns in invasive ovarian carcinomas and ovarian borderline tumors. , 2006, Cancer research.

[18]  Charles W. Gehrke,et al.  Complete mass spectra of the per-trimethylsilylated amino acids , 1977 .

[19]  Eiichiro Fukusaki,et al.  High-throughput technique for comprehensive analysis of Japanese green tea quality assessment using ultra-performance liquid chromatography with time-of-flight mass spectrometry (UPLC/TOF MS). , 2008, Journal of agricultural and food chemistry.

[20]  S. Stein,et al.  Deconvolution gas chromatography/mass spectrometry of urinary organic acids--potential for pattern recognition and automated identification of metabolic disorders. , 1999, Rapid communications in mass spectrometry : RCM.

[21]  S. Wold,et al.  SIMCA: A Method for Analyzing Chemical Data in Terms of Similarity and Analogy , 1977 .

[22]  Eiichiro Fukusaki,et al.  Metabolomics‐based systematic prediction of yeast lifespan and its application for semi‐rational screening of ageing‐related mutants , 2010, Aging cell.

[23]  Donald R. Scott,et al.  Determination of chemical classes from mass spectra of toxic organic compounds by SIMCA pattern recognition and information theory , 1986 .

[24]  I. Jolliffe Principal Component Analysis , 2002 .

[25]  M. Tomita,et al.  Capillary electrophoresis mass spectrometry-based saliva metabolomics identified oral, breast and pancreatic cancer-specific profiles , 2009, Metabolomics.

[26]  Martin Scholz,et al.  Setup and Annotation of Metabolomic Experiments by Integrating Biological and Mass Spectrometric Metadata , 2005, DILS.

[27]  R. Laine,et al.  Analysis of trimethylsilyl O-methyloximes of carbohydrates by combined gas-liquid chromatography-mass spectrometry. , 1971, Analytical biochemistry.

[28]  John T. Wei,et al.  Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression , 2009, Nature.

[29]  Donald R. Scott Classification and identification of mass spectra of toxic compounds with an inductive rule-building expert system and information theory , 1989 .

[30]  D. L. Massart,et al.  Decision criteria for soft independent modelling of class analogy applied to near infrared data , 1999 .

[31]  O. Fiehn,et al.  Identification of uncommon plant metabolites based on calculation of elemental compositions using gas chromatography and quadrupole mass spectrometry. , 2000, Analytical chemistry.

[32]  Arjen Lommen,et al.  MetAlign: interface-driven, versatile metabolomics tool for hyphenated full-scan mass spectrometry data preprocessing. , 2009, Analytical chemistry.

[33]  M. Klapa,et al.  Data correction strategy for metabolomics analysis using gas chromatography-mass spectrometry. , 2007, Metabolic engineering.

[34]  Robert D. Hall,et al.  Breakthrough Technologies A Novel Approach for Nontargeted Data Analysis for Metabolomics . Large-Scale Profiling of Tomato Fruit Volatiles 1 [ w ] , 2005 .

[35]  H. Lohninger,et al.  Classification of mass spectra: A comparison of yes/no classification methods for the recognition of simple structural properties , 1994 .