Automated reaction mapping

Automated reaction mapping is a fundamental first step in the analysis of chemical reactions and opens the door to the development of sophisticated chemical kinetic tools. This article formulates the reaction mapping problem as an optimization problem. The problem is shown to be NP-Complete for general graphs. Five algorithms based on canonical graph naming and enumerative combinatoric techniques are developed to solve the problem. Unlike previous formulations based on limited configurations or classifications, our algorithms are uniquely capable of mapping any reaction that can be represented as a set of chemical graphs optimally. This is due to the direct use of Graph Isomorphism as the basis for these algorithms as opposed to the more commonly used Maximum Common Subgraph. Experimental results on chemical and biological reaction databases demonstrate the efficiency of our algorithms.

[1]  Robert D. Carr,et al.  The Signature Molecular Descriptor. 4. Canonizing Molecules Using Extended Valence Sequences , 2004, J. Chem. Inf. Model..

[2]  Joannis Apostolakis,et al.  Automatic Determination of Reaction Mappings and Reaction Center Information. 1. The Imaginary Transition State Energy Approach , 2008, J. Chem. Inf. Model..

[3]  László Babai,et al.  Canonical labeling of graphs , 1983, STOC.

[4]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[5]  Jiri Pospichal,et al.  Two metrics in a graph theory modeling of organic chemistry , 1991, Discret. Appl. Math..

[6]  Steven Skiena,et al.  Computational Discrete Mathematics: Combinatorics and Graph Theory with Mathematica ® , 2009 .

[7]  Gabriel Valiente,et al.  Efficient Validation of Metabolic Pathway Databases , 2005 .

[8]  C. Westbrook,et al.  A Comprehensive Modeling Study of n-Heptane Oxidation , 1998 .

[9]  Peter Willett,et al.  Maximum common subgraph isomorphism algorithms for the matching of chemical structures , 2002, J. Comput. Aided Mol. Des..

[10]  Peter Willett,et al.  RASCAL: Calculation of Graph Similarity using Maximum Common Edge Subgraphs , 2002, Comput. J..

[11]  Vladimír Kvasnička Classification scheme of chemical reactions , 1984 .

[12]  Susumu Goto,et al.  LIGAND: chemical database for enzyme reactions , 1998, Bioinform..

[13]  G. É. Vléduts,et al.  Concerning one system of classification and codification of organic reactions , 1963, Inf. Storage Retr..

[14]  Michael F. Lynch,et al.  The Automatic Detection of Chemical Reaction Sites , 1978, J. Chem. Inf. Comput. Sci..

[15]  Johann Gasteiger,et al.  Automatic Determination of Reaction Mappings and Reaction Center Information. 2. Validation on a Biochemical Reaction Database , 2008, J. Chem. Inf. Model..

[16]  J. J. McGregor,et al.  Backtrack search algorithms and the maximal common subgraph problem , 1982, Softw. Pract. Exp..

[17]  Jean-Loup Faulon,et al.  Isomorphism, Automorphism Partitioning, and Canonical Labeling Can Be Solved in Polynomial-Time for Molecular Graphs , 1998, J. Chem. Inf. Comput. Sci..

[18]  Tatsuya Akutsu,et al.  Efficient extraction of mapping rules of atoms from enzymatic reaction data , 2003, RECOMB '03.

[19]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[20]  John E. Hopcroft,et al.  Linear time algorithm for isomorphism of planar graphs (Preliminary Report) , 1974, STOC '74.

[21]  Vladimír Kvasnička,et al.  Chemical and reaction metrics for graph-theoretical model of organic chemistry , 1991 .

[22]  Jean-Loup Faulon,et al.  The signature molecular descriptor. 3. Inverse-quantitative structure-activity relationship of ICAM-1 inhibitory peptides. , 2003, Journal of molecular graphics & modelling.

[23]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[24]  Andreas Dietz,et al.  Models, concepts, theories, and formal languages in chemistry and their use as a basis for computer assistance in chemistry , 1994, J. Chem. Inf. Comput. Sci..

[25]  Jing Huang,et al.  Identification of Symmetries in Molecules and Complexes , 2004, J. Chem. Inf. Model..

[26]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[27]  J. Gasteiger,et al.  Chemoinformatics: A Textbook , 2003 .

[28]  T. Cieplak,et al.  A New Effective Algorithm for the Unambiguous Identification of the Stereochemical Characteristics of Compounds During Their Registration in Databases , 2001, Molecules : A Journal of Synthetic Chemistry and Natural Product Chemistry.

[29]  Peter Willett,et al.  Use of a maximum common subgraph algorithm in the automatic identification of ostensible bond changes occurring in chemical reactions , 1981, J. Chem. Inf. Comput. Sci..

[30]  J. Gasteiger,et al.  The Principle of Minimum Chemical Distance (PMCD) , 1980 .

[31]  David L. Grier,et al.  The Implementation of Atom-Atom Mapping and Related Features in the Reaction Access System (REACCS) , 1988 .

[32]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[33]  Takunari Miyazaki,et al.  The complexity of McKay's canonical labeling algorithm , 1995, Groups and Computation.

[34]  Eugene M. Luks Isomorphism of Graphs of Bounded Valence Can Be Tested in Polynomial Time , 1980, FOCS.

[35]  Christoph M. Hoffmann,et al.  Group-Theoretic Algorithms and Graph Isomorphism , 1982, Lecture Notes in Computer Science.

[36]  J. Gasteiger,et al.  The Principle of Minimum Chemical Distance and the Principle of Minimum Structure Change , 1982 .