MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics

BackgroundIn spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases.DescriptionHere we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted.ConclusionsMINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.

[1]  Christoph Steinbeck,et al.  Natural product-likeness score revisited: an open-source, open-data implementation , 2012, BMC Bioinformatics.

[2]  Stephen E. Stein,et al.  Estimation of Kováts Retention Indices Using Group Contributions , 2007, J. Chem. Inf. Model..

[3]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[4]  B. Bowen,et al.  MIDAS: a database-searching algorithm for metabolite identification in metabolomics. , 2014, Analytical chemistry.

[5]  D. Herschlag,et al.  Catalytic promiscuity and the evolution of new enzymatic activities. , 1999, Chemistry & biology.

[6]  Fangping Mu,et al.  Prediction of metabolic reactions based on atomic and molecular properties of small-molecule compounds , 2011, Bioinform..

[7]  Steven Lai,et al.  MolFind: a software package enabling HPLC/MS-based identification of unknown chemical structures. , 2012, Analytical chemistry.

[8]  Fumio Matsuda,et al.  Rethinking Mass Spectrometry-Based Small Molecule Identification Strategies in Metabolomics. , 2014, Mass spectrometry.

[9]  Yvan Saeys,et al.  Systematic Structural Characterization of Metabolites in Arabidopsis via Candidate Substrate-Product Pair Networks[C][W] , 2014, Plant Cell.

[10]  Peter D. Karp,et al.  A systematic comparison of the MetaCyc and KEGG pathway databases , 2013, BMC Bioinformatics.

[11]  B. Griffin,et al.  Network Context and Selection in the Evolution to Enzyme Specificity , 2014 .

[12]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .

[13]  Juho Rousu,et al.  Metabolite identification and molecular fingerprint prediction through machine learning , 2012, Bioinform..

[14]  G. Siuzdak,et al.  Innovation: Metabolomics: the apogee of the omics trilogy , 2012, Nature Reviews Molecular Cell Biology.

[15]  V. Hatzimanikatis,et al.  Discovery and analysis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3‐hydroxypropanoate , 2010, Biotechnology and bioengineering.

[16]  Costas D. Maranas,et al.  MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases , 2012, BMC Bioinformatics.

[17]  Ion I. Mandoiu,et al.  In Silico Enzymatic Synthesis of a 400 000 Compound Biochemical Database for Nontargeted Metabolomics , 2013, J. Chem. Inf. Model..

[18]  Juho Rousu,et al.  Metabolite Identification through Machine Learning — Tackling CASMI Challenge Using FingerID , 2013, Metabolites.

[19]  Christophe Junot,et al.  Applications of liquid chromatography coupled to mass spectrometry-based metabolomics in clinical chemistry and toxicology: A review. , 2011, Clinical biochemistry.

[20]  Susumu Goto,et al.  PathPred: an enzyme-catalyzed metabolic pathway prediction server , 2010, Nucleic Acids Res..

[21]  J. Noel,et al.  The Rise of Chemodiversity in Plants , 2012, Science.

[22]  Chris Morley,et al.  Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit , 2008, Chemistry Central journal.

[23]  Oliver Fiehn,et al.  Extending Biochemical Databases by Metabolomic Surveys* , 2011, The Journal of Biological Chemistry.

[24]  Lisa Wood,et al.  Access to Alcohol Outlets, Alcohol Consumption and Mental Health , 2013, PloS one.

[25]  Oliver Fiehn,et al.  LipidBlast - in-silico tandem mass spectrometry database for lipid identification , 2013, Nature Methods.

[26]  Tao Huan,et al.  MyCompoundID: using an evidence-based metabolome library for metabolite identification. , 2013, Analytical chemistry.

[27]  Joshua N Sampson,et al.  Metabolomics in nutritional epidemiology: identifying metabolites associated with diet and quantifying their potential to uncover diet-disease relations in populations. , 2014, The American journal of clinical nutrition.

[28]  Mark P. Styczynski,et al.  Systematic Applications of Metabolomics in Metabolic Engineering , 2012, Metabolites.

[29]  Stefan Kramer,et al.  Data-driven extraction of relative reasoning rules to limit combinatorial explosion in biodegradation pathway prediction , 2008, Bioinform..

[30]  Dan S. Tawfik,et al.  The moderately efficient enzyme: evolutionary and physicochemical trends shaping enzyme parameters. , 2011, Biochemistry.

[31]  Bing Wang,et al.  A directed-overflow and damage-control N-glycosidase in riboflavin biosynthesis. , 2015, The Biochemical journal.

[32]  Vassily Hatzimanikatis,et al.  Theoretical considerations and computational analysis of the complexity in polyketide synthesis pathways. , 2005, Journal of the American Chemical Society.

[33]  Lynda B. M. Ellis,et al.  The University of Minnesota Pathway Prediction System: multi-level prediction and visualization , 2011, Nucleic Acids Res..

[34]  Matthew D. Jankowski,et al.  Genome-scale thermodynamic analysis of Escherichia coli metabolism. , 2006, Biophysical journal.

[35]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[36]  Stephen R. Heller,et al.  InChI - the worldwide chemical structure identifier standard , 2013, Journal of Cheminformatics.

[37]  Joseph M. Foster,et al.  LipidHome: A Database of Theoretical Lipids Optimized for High Throughput Mass Spectrometry Lipidomics , 2013, PloS one.

[38]  Emma L. Schymanski,et al.  CASMI: And the Winner is .. , 2013, Metabolites.

[39]  Ram Krishnamurthy,et al.  YMDB: the Yeast Metabolome Database , 2011, Nucleic Acids Res..

[40]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[41]  Israel Sánchez-Moreno,et al.  From Kinase to Cyclase: An Unusual Example of Catalytic Promiscuity Modulated by Metal Switching , 2009, Chembiochem : a European journal of chemical biology.

[42]  Peter D. Karp,et al.  EcoCyc: fusing model organism databases with systems biology , 2012, Nucleic Acids Res..

[43]  Dietmar Schomburg,et al.  BKM-react, an integrated biochemical reaction database , 2011, BMC Biochemistry.

[44]  R. Bino,et al.  In silico prediction and automatic LC-MS(n) annotation of green tea metabolites in urine. , 2014, Analytical chemistry.

[45]  Marcel J. T. Reinders,et al.  Metabolite and reaction inference based on enzyme specificities , 2009, Bioinform..

[46]  Stephen Stein,et al.  Mass spectral reference libraries: an ever-expanding resource for chemical identification. , 2012, Analytical chemistry.

[47]  Oliver Fiehn,et al.  Mass-spectrometry-based metabolomics: limitations and recommendations for future progress with particular focus on nutrition research , 2009, Metabolomics.

[48]  Matthias Müller-Hannemann,et al.  In silico fragmentation for computer assisted identification of metabolite mass spectra , 2010, BMC Bioinformatics.