iMet: A computational tool for structural annotation of unknown metabolites from tandem mass spectra

Untargeted metabolomic studies are revealing large numbers of naturally occurring metabolites that cannot be characterized because their chemical structures and MS/MS spectra are not available in databases. Here we present iMet, a computational tool based on experimental tandem mass spectrometry that allows the annotation of metabolites not discovered previously. iMet uses MS/MS spectra to identify metabolites structurally similar to an unknown metabolite, and gives a net atomic addition or removal that converts the known metabolite into the unknown one. We validate the algorithm with 148 metabolites, and show that for 89% of them at least one of the top four matches identified by iMet enables the proper annotation of the unknown metabolite. iMet is freely available at http://imet.seeslab.net.

[1]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[2]  Emma L. Schymanski,et al.  Mass spectral databases for LC/MS- and GC/MS-based metabolomics: state of the field and future prospects , 2016 .

[3]  Oliver Fiehn,et al.  MS2Analyzer: A Software for Small Molecule Substructure Annotations from Accurate Tandem Mass Spectra , 2014, Analytical chemistry.

[4]  Justin J J van der Hooft,et al.  Metabolite identification using automated comparison of high-resolution multistage mass spectral trees. , 2012, Analytical chemistry.

[5]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[6]  Mark R. Viant,et al.  MI-Pack: Increased confidence of metabolite identification in mass spectra by integrating accurate masses and metabolic pathways , 2010 .

[7]  R. Breitling,et al.  Precision mapping of the metabolome. , 2006, Trends in biotechnology.

[8]  Shuzhao Li,et al.  Predicting Network Activity from High Throughput Metabolomics , 2013, PLoS Comput. Biol..

[9]  Wen-Lian Hsu,et al.  Metabolite identification for mass spectrometry-based metabolomics using multiple types of correlated ion information. , 2015, Analytical chemistry.

[10]  G. Siuzdak,et al.  Identification of a new endogenous metabolite and the characterization of its protein interactions through an immobilization approach. , 2009, Journal of the American Chemical Society.

[11]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[12]  Steffen Neumann,et al.  MetFusion: integration of compound identification strategies. , 2013, Journal of mass spectrometry : JMS.

[13]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[14]  R. Abagyan,et al.  METLIN: A Metabolite Mass Spectral Database , 2005, Therapeutic drug monitoring.

[15]  Simon Rogers,et al.  Probabilistic assignment of formulas to mass peaks in metabolomics experiments , 2009, Bioinform..

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  R. Bino,et al.  In silico prediction and automatic LC-MS(n) annotation of green tea metabolites in urine. , 2014, Analytical chemistry.

[18]  Oliver Fiehn,et al.  LipidBlast - in-silico tandem mass spectrometry database for lipid identification , 2013, Nature Methods.

[19]  Steven Lai,et al.  MolFind: a software package enabling HPLC/MS-based identification of unknown chemical structures. , 2012, Analytical chemistry.

[20]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[21]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[22]  G. Siuzdak,et al.  Innovation: Metabolomics: the apogee of the omics trilogy , 2012, Nature Reviews Molecular Cell Biology.

[23]  Yvan Saeys,et al.  Systematic Structural Characterization of Metabolites in Arabidopsis via Candidate Substrate-Product Pair Networks[C][W] , 2014, Plant Cell.

[24]  Sanguthevar Rajasekaran,et al.  Metabolic Pathway Predictions for Metabolomics: A Molecular Structure Matching Approach , 2015, J. Chem. Inf. Model..

[25]  G. Patti,et al.  An untargeted metabolomic workflow to improve structural characterization of metabolites. , 2013, Analytical chemistry.

[26]  Juho Rousu,et al.  Metabolite identification and molecular fingerprint prediction through machine learning , 2012, Bioinform..

[27]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences of the United States of America.

[28]  M. Kanehisa,et al.  Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. , 2003, Journal of the American Chemical Society.

[29]  Tao Huan,et al.  MyCompoundID: using an evidence-based metabolome library for metabolite identification. , 2013, Analytical chemistry.

[30]  David S. Wishart,et al.  HMDB 3.0—The Human Metabolome Database in 2013 , 2012, Nucleic Acids Res..

[31]  Ralf Tautenhahn,et al.  An accelerated workflow for untargeted metabolomics using the METLIN database , 2012, Nature Biotechnology.

[32]  Matthias Müller-Hannemann,et al.  In silico fragmentation for computer assisted identification of metabolite mass spectra , 2010, BMC Bioinformatics.

[33]  David S. Wishart,et al.  CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra , 2014, Nucleic Acids Res..

[34]  Susan C. Connor,et al.  Assignment of MS-based metabolomic datasets via compound interaction pair mapping , 2008, Metabolomics.

[35]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .