Automated pipeline for de novo metabolite identification using mass-spectrometry-based metabolomics.

Metabolite identification is one of the biggest bottlenecks in metabolomics. Identifying human metabolites poses experimental, analytical, and computational challenges. Here we present a pipeline of previously developed cheminformatic tools and demonstrate how it facilitates metabolite identification using solely LC/MS(n) data. These tools process, annotate, and compare MS(n) data, and propose candidate structures for unknown metabolites either by identity assignment of identical mass spectral trees or by de novo identification using substructures of similar trees. The working and performance of this metabolite identification pipeline is demonstrated by applying it to LC/MS(n) data of urine samples. From human urine, 30 MS(n) trees of unknown metabolites were acquired, processed, and compared to a reference database containing MS(n) data of known metabolites. From these 30 unknowns, we could assign a putative identity for 10 unknowns by finding identical fragmentation trees. For 11 unknowns no similar fragmentation trees were found in the reference database. On the basis of elemental composition only, a large number of candidate structures/identities were possible, so these unknowns remained unidentified. The other 9 unknowns were also not found in the database, but metabolites with similar fragmentation trees were retrieved. Computer assisted structure elucidation was performed for these 9 unknowns: for 4 of them we could perform de novo identification and propose a limited number of candidate structures, and for the other 5 the structure generation process could not be constrained far enough to yield a small list of candidates. The novelty of this work is that it allows de novo identification of metabolites that are not present in a database by using MS(n) data and computational tools. We expect this pipeline to be the basis for the computer-assisted identification of new metabolites in future metabolomics studies, and foresee that further additions will allow the identification of even a larger fraction of the unknown metabolites.

[1]  Marc A. van Driel,et al.  MetiTree: a web application to organize and process high-resolution multi-stage mass spectrometry metabolomics data , 2012, Bioinform..

[2]  Oliver Fiehn,et al.  Metabolomic database annotations via query of elemental compositions: Mass accuracy is insufficient even at less than 1 ppm , 2006, BMC Bioinformatics.

[3]  Oliver Fiehn,et al.  Mass-spectrometry-based metabolomics: limitations and recommendations for future progress with particular focus on nutrition research , 2009, Metabolomics.

[4]  Simon Rogers,et al.  Probabilistic assignment of formulas to mass peaks in metabolomics experiments , 2009, Bioinform..

[5]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[6]  David S. Wishart,et al.  Development of Ecom50 and Retention Index Models for Nontargeted Metabolomics: Identification of 1, 3-Dicyclohexylurea in Human Serum by HPLC/Mass Spectrometry , 2012, J. Chem. Inf. Model..

[7]  Emma L. Schymanski,et al.  Automated strategies to identify compounds on the basis of GC/EI-MS and calculated properties. , 2011, Analytical chemistry.

[8]  Jean-Loup Faulon,et al.  OMG: Open Molecule Generator , 2012, Journal of Cheminformatics.

[9]  Justin J J van der Hooft,et al.  Metabolite identification using automated comparison of high-resolution multistage mass spectral trees. , 2012, Analytical chemistry.

[10]  Egon L. Willighagen,et al.  Elemental composition determination based on MSn , 2011, Bioinform..

[11]  Florian Rasche,et al.  Computing fragmentation trees from tandem mass spectrometry data. , 2011, Analytical chemistry.

[12]  Thomas Hankemeier,et al.  Fragmentation trees for the structural characterisation of metabolites , 2012, Rapid communications in mass spectrometry : RCM.

[13]  Adrian D Hegeman,et al.  A study on retention "projection" as a supplementary means for compound identification by liquid chromatography-mass spectrometry capable of predicting retention with different gradients, flow rates, and instruments. , 2011, Journal of chromatography. A.

[14]  Matthias Müller-Hannemann,et al.  In silico fragmentation for computer assisted identification of metabolite mass spectra , 2010, BMC Bioinformatics.

[15]  D. Kell,et al.  Mass Spectrometry Tools and Metabolite-specific Databases for Molecular Identification in Metabolomics , 2009 .

[16]  Raoul J. Bino,et al.  Spectral trees as a robust annotation tool in LC–MS based metabolomics , 2011, Metabolomics.

[17]  Jinlian Wang,et al.  MetaboSearch: Tool for Mass-Based Metabolite Identification Using Multiple Databases , 2012, PloS one.

[18]  Nigel W. Hardy,et al.  Proposed minimum reporting standards for chemical analysis , 2007, Metabolomics.

[19]  O. Fiehn Metabolomics – the link between genotypes and phenotypes , 2004, Plant Molecular Biology.

[20]  Christophe Junot,et al.  Annotation of the human adult urinary metabolome and metabolite identification using ultra high performance liquid chromatography coupled to a linear quadrupole ion trap-Orbitrap mass spectrometer. , 2012, Analytical chemistry.

[21]  Thomas Zichner,et al.  Identifying the unknowns by aligning fragmentation trees. , 2012, Analytical chemistry.

[22]  R. Breitling,et al.  Toward global metabolomics analysis with hydrophilic interaction liquid chromatography-mass spectrometry: improved metabolite identification by retention time prediction. , 2011, Analytical chemistry.

[23]  Juho Rousu,et al.  Metabolite identification and molecular fingerprint prediction through machine learning , 2012, Bioinform..

[24]  Robert Mistrik,et al.  Determination of ion structures in structurally related compounds using precursor ion fingerprinting , 2009, Journal of the American Society for Mass Spectrometry.

[25]  David S Wishart,et al.  Computational strategies for metabolite identification in metabolomics. , 2009, Bioanalysis.

[26]  Martin Krauss,et al.  Consensus structure elucidation combining GC/EI-MS, structure generation, and calculated properties. , 2012, Analytical chemistry.

[27]  Steven Lai,et al.  MolFind: a software package enabling HPLC/MS-based identification of unknown chemical structures. , 2012, Analytical chemistry.

[28]  Oliver Fiehn,et al.  Advances in structure elucidation of small molecules using mass spectrometry , 2010, Bioanalytical reviews.

[29]  Caroline H. Johnson,et al.  Challenges and opportunities of metabolomics , 2012, Journal of cellular physiology.

[30]  Wanchang Lin,et al.  Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules' , 2009, BMC Bioinformatics.

[31]  Emma L. Schymanski,et al.  Integrated analytical and computer tools for structure elucidation in effect-directed analysis , 2009 .

[32]  David S. Wishart,et al.  HMDB: a knowledgebase for the human metabolome , 2008, Nucleic Acids Res..

[33]  Florian Rasche,et al.  De novo analysis of electron impact mass spectra using fragmentation trees. , 2012, Analytica chimica acta.

[34]  Andreas Bender,et al.  Understanding and Classifying Metabolite Space and Metabolite-Likeness , 2011, PloS one.