Compound identification using random projection location-sensitive Hash for gas chromatography-mass spectrometry

Generally, some compounds identification methods use presently mass spectra similarity matching that cosine correlation and its composite measure are considered as similar approaches of mass spectra. Currently, several combination similarity measures had a much better performance, especially, Weighted-Cosine (WC) measure. In this work, we introduced random projection location-sensitive hash as a similar algorithm for mass spectrum, and then used it to ascertain compounds along with multiple projections to calculate the average of their hamming distances between binary codes of the replicate data and binary codes of reference data. To prove the performance of this method, the National Institute of Standards and Technology (NIST) mass spectral library was used as the reference database and replicate database was applied as the query data. The experimental results showed that the query and reference spectral using peak intensity weighting always outperform non-weighted the query and reference database. The performance of the random projection location-sensitive hash with repeated projections is almost completely similar to Weighted Cosine(WC)measure which has a supreme accuracy of 84% in similar search matching with the optimal weight factors of (0.53,1.3).

[1]  Matthieu Cord,et al.  Locality-Sensitive Hashing for Chi2 Distance , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Wen Gao,et al.  Location Discriminative Vocabulary Coding for Mobile Landmark Search , 2012, International Journal of Computer Vision.

[3]  Lei Wu,et al.  Compact projection: Simple and efficient near neighbor search with practical memory requirements , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  F. Dickert,et al.  Molecular Fingerprints Using Imprinting Techniques , 2000 .

[5]  Ilan Beer,et al.  Improving large‐scale proteomics by clustering of mass spectrometry data , 2004, Proteomics.

[6]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[7]  Imhoi Koo,et al.  Wavelet- and Fourier-transform-based spectrum similarity approaches to compound identification in gas chromatography/mass spectrometry. , 2011, Analytical chemistry.

[8]  J. Yates,et al.  GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. , 2003, Analytical chemistry.

[9]  Gennifer E. Merrihew,et al.  Deconvolution of mixture spectra from ion-trap data-independent-acquisition tandem mass spectrometry. , 2010, Analytical chemistry.

[10]  Imhoi Koo,et al.  Compound identification in GC-MS by simultaneously evaluating the mass spectrum and retention index. , 2014, The Analyst.

[11]  Jun Zhang,et al.  Dynamic multiple spectral similarity measures for compound identification , 2013, 2013 6th International Congress on Image and Signal Processing (CISP).

[12]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[13]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[14]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[15]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[16]  Imhoi Koo,et al.  A method of finding optimal weight factors for compound identification in gas chromatography-mass spectrometry , 2012, Bioinform..

[17]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[18]  Laurent Amsaleg,et al.  Locality sensitive hashing: A comparison of hash function types and querying mechanisms , 2010, Pattern Recognit. Lett..

[19]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.