A large scale test dataset to determine optimal retention index threshold based on three mass spectral similarity measures.

Retention index (RI) is useful for metabolite identification. However, when RI is integrated with mass spectral similarity for metabolite identification, many controversial RI threshold setup are reported in literatures. In this study, a large scale test dataset of 5844 compounds with both mass spectra and RI information were created from National Institute of Standards and Technology (NIST) repetitive mass spectra (MS) and RI library. Three MS similarity measures: NIST composite measure, the real part of Discrete Fourier Transform (DFT.R) and the detail of Discrete Wavelet Transform (DWT.D) were used to investigate the accuracy of compound identification using the test dataset. To imitate real identification experiments, NIST MS main library was employed as reference library and the test dataset was used as search data. Our study shows that the optimal RI thresholds are 22, 15, and 15 i.u. for the NIST composite, DFT.R and DWT.D measures, respectively, when the RI and mass spectral similarity are integrated for compound identification. Compared to the mass spectrum matching, using both RI and mass spectral matching can improve the identification accuracy by 1.7%, 3.5%, and 3.5% for the three mass spectral similarity measures, respectively. It is concluded that the improvement of RI matching for compound identification heavily depends on the method of MS spectral similarity measure and the accuracy of RI data.

[1]  J. Dewulf,et al.  Temperature dependence of the Kováts retention index. The entropy index. , 2006, Journal of chromatography. A.

[2]  M. Careri,et al.  Retention indices in the analysis of food aroma volatile compounds in temperature-programmed gas chromatography: database creation and evaluation of precision and robustness. , 2007, Journal of separation science.

[3]  C. Poole,et al.  Recent advances in solvation models for stationary phase characterization and the prediction of retention in gas chromatography , 1992 .

[4]  K. Héberger Quantitative structure-(chromatographic) retention relationships. , 2007, Journal of chromatography. A.

[5]  Stephen E. Stein,et al.  Estimation of Kováts Retention Indices Using Group Contributions , 2007, J. Chem. Inf. Model..

[6]  O. Fiehn,et al.  FiehnLib: mass spectral and retention index libraries for metabolomics based on quadrupole and time-of-flight gas chromatography/mass spectrometry. , 2009, Analytical chemistry.

[7]  Aiqin Fang,et al.  iMatch: a retention index tool for analysis of gas chromatography-mass spectrometry data. , 2011, Journal of chromatography. A.

[8]  V. Babushok,et al.  Application of histograms in evaluation of large collections of gas chromatographic retention indices. , 2009, Journal of chromatography. A.

[9]  R. L. Brown,et al.  Development of a database of gas chromatographic retention properties of organic compounds. , 2007, Journal of chromatography. A.

[10]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[11]  E. Kováts,et al.  Gas‐chromatographische Charakterisierung organischer Verbindungen. Teil 1: Retentionsindices aliphatischer Halogenide, Alkohole, Aldehyde und Ketone , 1958 .

[12]  A. Katritzky,et al.  QSPR correlation and predictions of GC retention indexes for methyl-branched hydrocarbons produced by insects. , 2000, Analytical chemistry.

[13]  Nigel W. Hardy,et al.  Proposed minimum reporting standards for chemical analysis , 2007, Metabolomics.

[14]  D. Schomburg,et al.  GC–MS libraries for the rapid identification of metabolites in complex biological samples , 2005, FEBS letters.

[15]  S. Böcker,et al.  Computational mass spectrometry for metabolomics: Identification of metabolites and small molecules , 2010, Analytical and Bioanalytical Chemistry.

[16]  Jan Hummel,et al.  Retention index thresholds for compound matching in GC-MS metabolite profiling. , 2008, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[17]  Imhoi Koo,et al.  Wavelet- and Fourier-transform-based spectrum similarity approaches to compound identification in gas chromatography/mass spectrometry. , 2011, Analytical chemistry.

[18]  S. Stein,et al.  Estimating probabilities of correct identification from results of mass spectral library searches , 1994, Journal of the American Society for Mass Spectrometry.

[19]  Dietmar Schomburg,et al.  MetaboliteDetector: comprehensive analysis tool for targeted and nontargeted GC/MS based metabolome analysis. , 2009, Analytical chemistry.

[20]  Roeland C. H. J. van Ham,et al.  Automated procedure for candidate compound selection in GC-MS metabolomics based on prediction of Kovats retention index , 2009, Bioinform..

[21]  H. Vandendool,et al.  A GENERALIZATION OF THE RETENTION INDEX SYSTEM INCLUDING LINEAR TEMPERATURE PROGRAMMED GAS-LIQUID PARTITION CHROMATOGRAPHY. , 1963, Journal of chromatography.

[22]  Mohammed Hossein Fatemi,et al.  Predictions of chromatographic retention indices of alkylphenols with support vector machines and multiple linear regression. , 2009, Journal of separation science.

[23]  M. Saraste,et al.  FEBS Lett , 2000 .

[24]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[25]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.