Metabolite identification through multiple kernel learning on fragmentation trees

Motivation: Metabolite identification from tandem mass spectrometric data is a key task in metabolomics. Various computational methods have been proposed for the identification of metabolites from tandem mass spectra. Fragmentation tree methods explore the space of possible ways in which the metabolite can fragment, and base the metabolite identification on scoring of these fragmentation trees. Machine learning methods have been used to map mass spectra to molecular fingerprints; predicted fingerprints, in turn, can be used to score candidate molecular structures. Results: Here, we combine fragmentation tree computations with kernel-based machine learning to predict molecular fingerprints and identify molecular structures. We introduce a family of kernels capturing the similarity of fragmentation trees, and combine these kernels using recently proposed multiple kernel learning approaches. Experiments on two large reference datasets show that the new methods significantly improve molecular fingerprint prediction accuracy. These improvements result in better metabolite identification, doubling the number of metabolites ranked at the top position of the candidates list. Contact: huibin.shen@aalto.fi Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Florian Rasche,et al.  Towards de novo identification of metabolites by analyzing tandem mass spectra , 2008, ECCB.

[2]  Justin J J van der Hooft,et al.  Metabolite identification using automated comparison of high-resolution multistage mass spectral trees. , 2012, Analytical chemistry.

[3]  David S. Wishart,et al.  Competitive Fragmentation Modeling of ESI-MS/MS spectra for metabolite identification , 2013, ArXiv.

[4]  Matthias Müller-Hannemann,et al.  In silico fragmentation for computer assisted identification of metabolite mass spectra , 2010, BMC Bioinformatics.

[5]  P. Bartlett,et al.  ` p-Norm Multiple Kernel Learning , 2008 .

[6]  Sebastian Böcker,et al.  Computational mass spectrometry for small-molecule fragmentation , 2014 .

[7]  Zsuzsanna Lipták,et al.  SIRIUS: decomposing isotope patterns for metabolite identification , 2008, Bioinform..

[8]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[9]  R. Friedman,et al.  Mass spectral metabonomics beyond elemental formula: chemical database querying by matching experimental with computational fragmentation spectra. , 2008, Analytical chemistry.

[10]  Florian Rasche,et al.  Finding Maximum Colorful Subtrees in Practice , 2012, RECOMB.

[11]  Robert R. Lewis,et al.  In silico identification software (ISIS): a machine learning approach to tandem mass spectral identification of lipids , 2012, Bioinform..

[12]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[13]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[14]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[15]  哲二 久保山 Matching and learning in trees , 2007 .

[16]  Juho Rousu,et al.  Metabolite identification and molecular fingerprint prediction through machine learning , 2012, Bioinform..

[17]  Thomas Zichner,et al.  Identifying the unknowns by aligning fragmentation trees. , 2012, Analytical chemistry.

[18]  Shiliang Sun,et al.  Nonlinear Combination of Multiple Kernels for Support Vector Machines , 2010, 2010 20th International Conference on Pattern Recognition.

[19]  Russ Greiner,et al.  Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification , 2013, Metabolomics.

[20]  R. Abagyan,et al.  METLIN: A Metabolite Mass Spectral Database , 2005, Therapeutic drug monitoring.

[21]  Juho Rousu,et al.  Computational methods for metabolic reconstruction. , 2010, Current opinion in biotechnology.

[22]  Steffen Neumann,et al.  MetFusion: integration of compound identification strategies. , 2013, Journal of mass spectrometry : JMS.

[23]  K. Varmuza,et al.  Spectral similarity versus structural similarity: mass spectrometry , 2004 .

[24]  R. Schuhmacher,et al.  On the inter-instrument and the inter-laboratory transferability of a tandem mass spectral reference library: 2. Optimization and characterization of the search algorithm. , 2009, Journal of mass spectrometry : JMS.

[25]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[26]  Oliver Fiehn,et al.  Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry , 2007, BMC Bioinformatics.

[27]  Juho Rousu,et al.  Metabolite Identification through Machine Learning — Tackling CASMI Challenge Using FingerID , 2013, Metabolites.

[28]  Florian Rasche,et al.  Computing fragmentation trees from tandem mass spectrometry data. , 2011, Analytical chemistry.

[29]  Ralf Tautenhahn,et al.  An accelerated workflow for untargeted metabolomics using the METLIN database , 2012, Nature Biotechnology.

[30]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[31]  Jean-Philippe Vert,et al.  Graph kernels based on tree patterns for molecules , 2006, Machine Learning.

[32]  Mehryar Mohri,et al.  Algorithms for Learning Kernels Based on Centered Alignment , 2012, J. Mach. Learn. Res..

[33]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[34]  Sebastian Böcker,et al.  Computational mass spectrometry for small molecules , 2013, Journal of Cheminformatics.

[35]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..