Compound identification using partial and semipartial correlations for gas chromatography-mass spectrometry data.

Compound identification is a key component of data analysis in the applications of gas chromatography-mass spectrometry (GC-MS). Currently, the most widely used compound identification is mass spectrum matching, in which the dot product and its composite version are employed as spectral similarity measures. Several forms of transformations for fragment ion intensities have also been proposed to increase the accuracy of compound identification. In this study, we introduced partial and semipartial correlations as mass spectral similarity measures and applied them to identify compounds along with different transformations of peak intensity. The mixture versions of the proposed method were also developed to further improve the accuracy of compound identification. To demonstrate the performance of the proposed spectral similarity measures, the National Institute of Standards and Technology (NIST) mass spectral library and replicate spectral library were used as the reference library and the query spectra, respectively. Identification results showed that the mixture partial and semipartial correlations always outperform both the dot product and its composite measure. The mixture similarity with semipartial correlation has the highest accuracy of 84.6% in compound identification with a transformation of (0.53,1.3) for fragment ion intensity and m/z value, respectively.

[1]  J. Yates,et al.  Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility. , 2003, Analytical chemistry.

[2]  David R. Gilbert,et al.  MetaNetter: inference and visualization of high-resolution metabolomic networks , 2008, Bioinform..

[3]  R. Beavis,et al.  Using annotated peptide mass spectrum libraries for protein identification. , 2006, Journal of proteome research.

[4]  Fred W. McLafferty,et al.  Reliability ranking and scaling improvements to the probability based matching system for unknown mass spectra , 1985 .

[5]  Imhoi Koo,et al.  Wavelet- and Fourier-transform-based spectrum similarity approaches to compound identification in gas chromatography/mass spectrometry. , 2011, Analytical chemistry.

[6]  R K Julian,et al.  A method for quantitatively differentiating crude natural extracts using high-performance liquid chromatography-electrospray mass spectrometry. , 1998, Analytical chemistry.

[7]  Imhoi Koo,et al.  A method of finding optimal weight factors for compound identification in gas chromatography-mass spectrometry , 2012, Bioinform..

[8]  K. Biemann,et al.  Identification of mass spectra by computer-searching a file of known spectra , 1971 .

[9]  Thomas L. Isenhour,et al.  The Evaluation of Mass Spectral Search Algorithms , 1979, J. Chem. Inf. Comput. Sci..

[10]  William Stafford Noble,et al.  Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. , 2006, Analytical chemistry.

[11]  K. Komurov,et al.  Revealing static and dynamic modular architecture of the eukaryotic protein interaction network , 2007, Molecular Systems Biology.

[12]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[13]  Ilan Beer,et al.  Improving large‐scale proteomics by clustering of mass spectrometry data , 2004, Proteomics.

[14]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..