Compound Identification Using Random Projection for Gas Chromatography-Mass Spectrometry Data

In general, compound identification through library searching is performed on original mass spectral space by using some developed similarity measure. In this paper, the original mass spectral space was transformed into binary space by random projection. The hamming distance between query and reference the vector of binary space are calculated. The Mass Spectral Library 2005 (NIST05) main library is used as reference database and the replicate library is used as query data. With the number of binary digits increasing, the accuracy of compound identification is also increased. When the number set as 2076 bits, random projection achieve better identification performance than corresponding three similarity measures.

[1]  Imhoi Koo,et al.  A method of finding optimal weight factors for compound identification in gas chromatography-mass spectrometry , 2012, Bioinform..

[2]  Imhoi Koo,et al.  Comparative analysis of mass spectral matching-based compound identification in gas chromatography-mass spectrometry. , 2013, Journal of chromatography. A.

[3]  F W McLafferty,et al.  Comparison of algorithms and databases for matching unknown mass spectra , 1998, Journal of the American Society for Mass Spectrometry.

[4]  O. Fiehn,et al.  Mass spectrometry-based metabolic profiling reveals different metabolite patterns in invasive ovarian carcinomas and ovarian borderline tumors. , 2006, Cancer research.

[5]  Arvind Visvanathan,et al.  Information-theoretic mass spectral library search for comprehensive two-dimensional gas chromatography with mass spectrometry , 2008 .

[6]  Imhoi Koo,et al.  Wavelet- and Fourier-transform-based spectrum similarity approaches to compound identification in gas chromatography/mass spectrometry. , 2011, Analytical chemistry.

[7]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[8]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[9]  K. Biemann,et al.  Identification of mass spectra by computer-searching a file of known spectra , 1971 .

[10]  Imhoi Koo,et al.  Compound identification using partial and semipartial correlations for gas chromatography-mass spectrometry data. , 2012, Analytical chemistry.

[11]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[12]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.