CLCA: Maximum Common Molecular Substructure Queries within the MetRxn Database

The challenge of automatically identifying the preserved molecular moieties in a chemical reaction is referred to as the atom mapping problem. Reaction atom maps provide the ability to locate the fate of individual atoms across an entire metabolic network. Atom maps are used to track atoms in isotope labeling experiments for metabolic flux elucidation, trace novel biosynthetic routes to a target compound, and contrast entire pathways for structural homology. However, rapid computation of the reaction atom mappings remains elusive despite significant research. We present a novel substructure search algorithm, canonical labeling for clique approximation (CLCA), with polynomial run-time complexity to quickly generate atom maps for all the reactions present in MetRxn. CLCA uses number theory (i.e., prime factorization) to generate canonical labels or unique IDs and identify a bijection between the vertices (atoms) of two distinct molecular graphs. CLCA utilizes molecular graphs generated by combining atomistic information on reactions and metabolites from 112 metabolic models and 8 metabolic databases. CLCA offers improvements in run time, accuracy, and memory utilization over existing heuristic and combinatorial maximum common substructure (MCS) search algorithms. We provide detailed examples on the various advantages as well as failure modes of CLCA over existing algorithms.

[1]  Gemma L. Holliday,et al.  EC-BLAST: A Tool to Automatically Search and Compare Enzyme Reactions , 2014, Nature Methods.

[2]  Pierre Baldi,et al.  ReactionMap: An Efficient Atom-Mapping Algorithm for Chemical Reactions , 2013, J. Chem. Inf. Model..

[3]  David Z. Chen,et al.  Automatic reaction mapping and reaction center detection , 2013 .

[4]  Guenter Grethe,et al.  Algorithm for Reaction Classification , 2013, J. Chem. Inf. Model..

[5]  Rolf Backofen,et al.  Atom Mapping with Constraint Programming , 2013, CP.

[6]  Peter D. Karp,et al.  Accurate Atom-Mapping Computation for Biochemical Reactions , 2012, J. Chem. Inf. Model..

[7]  Luay Nakhleh,et al.  Quantifying and Assessing the Effect of Chemical Symmetry in Metabolic Pathways , 2012, J. Chem. Inf. Model..

[8]  Christodoulos A. Floudas,et al.  Stereochemically Consistent Reaction Mapping and Identification of Multiple Reaction Mechanisms through Integer Linear Optimization , 2012, J. Chem. Inf. Model..

[9]  V. Ferrières,et al.  Two-step synthesis of per-O-acetylfuranoses: optimization and rationalization. , 2012, The Journal of organic chemistry.

[10]  Christoph Steinbeck,et al.  Rhea—a manually curated resource of biochemical reactions , 2011, Nucleic Acids Res..

[11]  Costas D. Maranas,et al.  MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases , 2012, BMC Bioinformatics.

[12]  Juho Rousu,et al.  Computing Atom Mappings for Biochemical Reactions without Subgraph Isomorphism , 2011, J. Comput. Biol..

[13]  Antje Chang,et al.  BRENDA, the enzyme information system in 2011 , 2010, Nucleic Acids Res..

[14]  Matthias Rarey,et al.  Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review , 2011 .

[15]  Dinesh P. Mehta,et al.  An Open-Source Java Platform for Automated Reaction Mapping , 2010, J. Chem. Inf. Model..

[16]  Tamer Kahveci,et al.  SubMAP: Aligning Metabolic Pathways with Subnetwork Mappings , 2010, J. Comput. Biol..

[17]  Dietmar Schomburg,et al.  Automatic Assignment of EC Numbers , 2010, PLoS Comput. Biol..

[18]  Rainer Schrader,et al.  Small Molecule Subgraph Detector (SMSD) toolkit , 2009, J. Cheminformatics.

[19]  Yoshihiro Yamanishi,et al.  E-zyme: predicting potential EC numbers from the chemical transformation pattern of substrate-product pairs , 2009, Bioinform..

[20]  Johann Gasteiger,et al.  Investigations of Enzyme-Catalyzed Reactions Based on Physicochemical Descriptors Applied to Hydrolases , 2009, J. Chem. Inf. Model..

[21]  Gregory Kucherov,et al.  Structural pattern matching of nonribosomal peptides , 2009, BMC Structural Biology.

[22]  Marcel J. T. Reinders,et al.  Metabolic pathway alignment between species using a comprehensive and flexible similarity measure , 2008, BMC Systems Biology.

[23]  Qing-You Zhang,et al.  Genome-scale classification of metabolic reactions and assignment of EC numbers with self-organizing maps , 2008, Bioinform..

[24]  Johann Gasteiger,et al.  Automatic Determination of Reaction Mappings and Reaction Center Information. 2. Validation on a Biochemical Reaction Database , 2008, J. Chem. Inf. Model..

[25]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[26]  Adam M. Feist,et al.  A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information , 2007, Molecular systems biology.

[27]  Gemma L. Holliday,et al.  Using reaction mechanism to measure enzyme similarity. , 2007, Journal of molecular biology.

[28]  Ying Zhang,et al.  HMDB: the Human Metabolome Database , 2007, Nucleic Acids Res..

[29]  G. Stephanopoulos,et al.  Elementary metabolite units (EMU): a novel framework for modeling isotopic distributions. , 2007, Metabolic engineering.

[30]  Gopal R. Gopinath,et al.  Reactome: a knowledge base of biologic pathways and processes , 2007, Genome Biology.

[31]  Peter Willett,et al.  Scaffold Hopping Using Clique Detection Applied to Reduced Graphs , 2006, J. Chem. Inf. Model..

[32]  Ron Y. Pinter,et al.  Alignment of metabolic pathways , 2005, Bioinform..

[33]  V. Levdikov,et al.  Crystallization of diaminopimelate decarboxylase from Escherichia coli, a stereospecific D-amino-acid decarboxylase. , 2002, Acta crystallographica. Section D, Biological crystallography.

[34]  Peter Willett,et al.  Heuristics for Similarity Searching of Chemical Graphs Using a Maximum Common Edge Subgraph Algorithm , 2002, J. Chem. Inf. Comput. Sci..

[35]  Keith F. Tipton,et al.  History of the enzyme nomenclature system , 2000, Bioinform..

[36]  Jean-Loup Faulon,et al.  Isomorphism, Automorphism Partitioning, and Canonical Labeling Can Be Solved in Polynomial-Time for Molecular Graphs , 1998, J. Chem. Inf. Comput. Sci..

[37]  G. P. Moss Basic terminology of stereochemistry (IUPAC Recommendations 1996) , 1996 .

[38]  Eric Fontain,et al.  The problem of atom-to-atom mapping. An application of genetic algorithms , 1992 .

[39]  The stereospecificities of seven dehydrogenases from Acholeplasma laidlawii. The simplest historical model that explains dehydrogenase stereospecificity. , 1990, The Journal of biological chemistry.

[40]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[41]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[42]  Kimito Funatsu,et al.  Automatic recognition of reaction site in organic chemical reactions , 1988 .

[43]  F. Yoneda,et al.  A ONE-STEP SYNTHESIS OF PURINE DERIVATIVES BY THE REACTION OF PHENYLAZOMALONAMIDAMIDINE WITH ARYL ALDEHYDES , 1982 .

[44]  Peter Willett,et al.  Use of a maximum common subgraph algorithm in the automatic identification of ostensible bond changes occurring in chemical reactions , 1981, J. Chem. Inf. Comput. Sci..

[45]  J. Gasteiger,et al.  The Principle of Minimum Chemical Distance (PMCD) , 1980 .

[46]  Michael F. Lynch,et al.  The Automatic Detection of Chemical Reaction Sites , 1978, J. Chem. Inf. Comput. Sci..

[47]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[48]  D. Corneil,et al.  An Efficient Algorithm for Graph Isomorphism , 1970, JACM.

[49]  I. A. Rose,et al.  Mechanism of aconitase action. I. The hydrogen transfer reaction. , 1967, The Journal of biological chemistry.

[50]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[51]  G. É. Vléduts,et al.  Concerning one system of classification and codification of organic reactions , 1963, Inf. Storage Retr..