Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification

Electrospray tandem mass spectrometry (ESI-MS/MS) is commonly used in high throughput metabolomics. One of the key obstacles to the effective use of this technology is the difficulty in interpreting measured spectra to accurately and efficiently identify metabolites. Traditional methods for automated metabolite identification compare the target MS or MS/MS spectrum to the spectra in a reference database, ranking candidates based on the closeness of the match. However the limited coverage of available databases has led to an interest in computational methods for predicting reference MS/MS spectra from chemical structures. This work proposes a probabilistic generative model for the MS/MS fragmentation process, which we call competitive fragmentation modeling (CFM), and a machine learning approach for learning parameters for this model from MS/MS data. We show that CFM can be used in both a MS/MS spectrum prediction task (ie, predicting the mass spectrum from a chemical structure), and in a putative metabolite identification task (ranking possible structures for a target MS/MS spectrum). In the MS/MS spectrum prediction task, CFM shows significantly improved performance when compared to a full enumeration of all peaks corresponding to substructures of the molecule. In the metabolite identification task, CFM obtains substantially better rankings for the correct candidate than existing methods (MetFrag and FingerID) on tripeptide and metabolite data, when querying PubChem or KEGG for candidate structures of similar mass.

[1]  J. Gasteiger,et al.  ITERATIVE PARTIAL EQUALIZATION OF ORBITAL ELECTRONEGATIVITY – A RAPID ACCESS TO ATOMIC CHARGES , 1980 .

[2]  W. Deming,et al.  On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known , 1940 .

[3]  Ari Rantanen,et al.  FiD: a software for ab initio structural identification of product ions from tandem mass spectrometric data. , 2008, Rapid communications in mass spectrometry : RCM.

[4]  Joshua Lederberg,et al.  Applications of Artificial Intelligence for Organic Chemistry: The DENDRAL Project , 1980 .

[5]  Frederique Lisacek,et al.  X-Rank: a robust algorithm for small molecule identification using tandem mass spectrometry. , 2009, Analytical chemistry.

[6]  Herbert Thiele,et al.  Even-electron ions: a systematic study of the neutral species lost in the dissociation of quasi-molecular ions. , 2007, Journal of mass spectrometry : JMS.

[7]  R. Schuhmacher,et al.  On the inter-instrument and the inter-laboratory transferability of a tandem mass spectral reference library: 2. Optimization and characterization of the search algorithm. , 2009, Journal of mass spectrometry : JMS.

[8]  Florian Rasche,et al.  Towards de novo identification of metabolites by analyzing tandem mass spectra , 2008, ECCB.

[9]  Adalbert Kerber,et al.  CASE via MS: Ranking Structure Candidates by Mass Spectra , 2006 .

[10]  R. Dougherty Mass spectrometry principles and applications , 1997 .

[11]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[12]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[13]  Robert R. Lewis,et al.  In silico identification software (ISIS): a machine learning approach to tandem mass spectral identification of lipids , 2012, Bioinform..

[14]  O. Fiehn Metabolomics – the link between genotypes and phenotypes , 2004, Plant Molecular Biology.

[15]  David S. Wishart,et al.  Current Progress in computational metabolomics , 2007, Briefings Bioinform..

[16]  David S. Wishart,et al.  HMDB 3.0—The Human Metabolome Database in 2013 , 2012, Nucleic Acids Res..

[17]  Eric Moulines,et al.  Inference in hidden Markov models , 2010, Springer series in statistics.

[18]  Johann Gasteiger,et al.  Prediction of mass spectra from structural information , 1992, J. Chem. Inf. Comput. Sci..

[19]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[20]  Sebastian Böcker,et al.  Computational mass spectrometry for small-molecule fragmentation , 2014 .

[21]  R. Mortishire-Smith,et al.  Automated assignment of high‐resolution collisionally activated dissociation mass spectra using a systematic bond disconnection approach , 2005 .

[22]  I. Papayannopoulos,et al.  The interpretation of collision‐induced dissociation tandem mass spectra of peptides , 1996 .

[23]  I. Papayannopoulos The Interpretation of Collision‐Induced Dissociation Tandem Mass Spectra of Peptides , 1996 .

[24]  Malcolm E. Rose,et al.  Interpretation of mass spectra, 4th edition F. W. McLAFFERTY AND F. TUREČEK Published by University Science Books, Mill Valley, 1993 ISBN 0‐935702‐25‐3, xiii + 371 pp. , 1994 .

[25]  Matthias Müller-Hannemann,et al.  In silico fragmentation for computer assisted identification of metabolite mass spectra , 2010, BMC Bioinformatics.

[26]  Scott A. McLuckey,et al.  The American Society for Mass Spectrometry , 1996 .

[27]  J. Trottier,et al.  Book Review: Mass Spectrometry: Principles and Applications. E. de Hoffman, J. Charette and W. Stroobant. Wiley, Chichester 1996. ISBN 0 471 96697 5 , 1997 .

[28]  David I. Ellis,et al.  Metabolomics: Current analytical platforms and methodologies , 2005 .

[29]  David S. Wishart,et al.  HMDB: a knowledgebase for the human metabolome , 2008, Nucleic Acids Res..

[30]  Sebastian Böcker,et al.  Computational mass spectrometry for small molecules , 2013, Journal of Cheminformatics.

[31]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[32]  Edmond de Hoffmann,et al.  Comprar Mass Spectrometry: Principles and Applications | Vincent Stroobant | 9780470033104 | Wiley , 2007 .

[33]  F. McLafferty Interpretation of Mass Spectra , 1966 .

[34]  Ralf Tautenhahn,et al.  An accelerated workflow for untargeted metabolomics using the METLIN database , 2012, Nature Biotechnology.

[35]  D. Wishart Advances in metabolite identification. , 2011, Bioanalysis.

[36]  Oliver Fiehn,et al.  Advances in structure elucidation of small molecules using mass spectrometry , 2010, Bioanalytical reviews.

[37]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[38]  Nigel W. Hardy,et al.  Proposed minimum reporting standards for chemical analysis , 2007, Metabolomics.

[39]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[40]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[41]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[42]  Sándor Suhai,et al.  Fragmentation pathways of protonated peptides. , 2005, Mass spectrometry reviews.

[43]  Chris-Kriton Skylaris,et al.  A predictive science approach to aid understanding of electrospray ionisation tandem mass spectrometric fragmentation pathways of small molecules using density functional calculations. , 2013, Rapid communications in mass spectrometry : RCM.

[44]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[45]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .

[46]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[47]  Juho Rousu,et al.  Metabolite identification and molecular fingerprint prediction through machine learning , 2012, Bioinform..

[48]  R. Abagyan,et al.  METLIN: A Metabolite Mass Spectral Database , 2005, Therapeutic drug monitoring.

[49]  Haikady N. Nagaraja,et al.  Inference in Hidden Markov Models , 2006, Technometrics.

[50]  M. Orešič,et al.  Data processing for mass spectrometry-based metabolomics. , 2007, Journal of chromatography. A.