Algorithm for Reaction Classification

Reaction classification has important applications, and many approaches to classification have been applied. Our own algorithm tests all maximum common substructures (MCS) between all reactant and product molecules in order to find an atom mapping containing the minimum chemical distance (MCD). Recent publications have concluded that new MCS algorithms need to be compared with existing methods in a reproducible environment, preferably on a generalized test set, yet the number of test sets available is small, and they are not truly representative of the range of reactions that occur in real reaction databases. We have designed a challenging test set of reactions and are making it publicly available and usable with InfoChem's software or other classification algorithms. We supply a representative set of example reactions, grouped into different levels of difficulty, from a large number of reaction databases that chemists actually encounter in practice, in order to demonstrate the basic requirements for a mapping algorithm to detect the reaction centers in a consistent way. We invite the scientific community to contribute to the future extension and improvement of this data set, to achieve the goal of a common standard.

[1]  Johann Gasteiger,et al.  Automatic Determination of Reaction Mappings and Reaction Center Information. 2. Validation on a Biochemical Reaction Database , 2008, J. Chem. Inf. Model..

[2]  J. Gasteiger,et al.  Organic Reactions Classified by Neural Networks: Michael Additions, Friedel–Crafts Alkylations by Alkenes, and Related Reactions† , 1996 .

[3]  Gilles Marcou,et al.  Mining Chemical Reactions Using Neighborhood Behavior and Condensed Graphs of Reactions Approaches , 2012, J. Chem. Inf. Model..

[4]  Dragos Horvath,et al.  Models for Identification of Erroneous Atom-to-Atom Mapping of Reactions Performed by Automated Algorithms , 2012, J. Chem. Inf. Model..

[5]  Johann Gasteiger,et al.  Automatic Extraction of Chemical Knowledge from Organic Reaction Data: Addition of Carbon-Hydrogen Bonds to Carbon-Carbon Double Bonds , 1995 .

[6]  David Z. Chen,et al.  Automatic reaction mapping and reaction center detection , 2013 .

[7]  Christodoulos A. Floudas,et al.  Stereochemically Consistent Reaction Mapping and Identification of Multiple Reaction Mechanisms through Integer Linear Optimization , 2012, J. Chem. Inf. Model..

[8]  Edward S. Blurock,et al.  Reaction: System for Modeling Chemical Reactions , 1995, J. Chem. Inf. Comput. Sci..

[9]  Shinsaku Fujita Canonical numbering and coding of reaction center graphs and reduced reaction center graphs abstracted from imaginary transition structures. A novel approach to the linear coding of reaction types , 1988, J. Chem. Inf. Comput. Sci..

[10]  Jan H. Noordik,et al.  Chemical reaction searching compared in REACCS, SYNLIB, and ORAC , 1988, J. Chem. Inf. Comput. Sci..

[11]  Johann Gasteiger,et al.  HORACE: An automatic system for the hierarchical classification of chemical reactions , 1994, Journal of chemical information and computer sciences.

[12]  Serge S. Tratch,et al.  Symbolic equations and their applications to reaction design , 1991 .

[13]  James B. Hendrickson,et al.  COGNOS: A Beilstein-Type System for Organizing Organic Reactions , 1995, J. Chem. Inf. Comput. Sci..

[14]  E. Jacobsen,et al.  Comprehensive Asymmetric Catalysis I–III , 1999 .

[15]  Matthias Rarey,et al.  Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review , 2011 .

[16]  Peter Willett,et al.  The Evaluation of an Automatically Indexed, Machine-Readable Chemical Reactions File , 1980, Journal of chemical information and computer sciences.

[17]  Chyouhwa Chen,et al.  Building and refining a knowledge base for synthetic organic chemistry via the methodology of inductive and deductive machine learning , 1990, J. Chem. Inf. Comput. Sci..

[18]  Rainer Herges Organizing Principle of Complex Reactions and Theory of Coarctate Transition States , 1994 .

[19]  Clara D. Christ,et al.  Mining Electronic Laboratory Notebooks: Analysis, Retrosynthesis, and Reaction Based Enumeration , 2012, J. Chem. Inf. Model..

[20]  Oliver Kohlbacher,et al.  MetaRoute: fast search for relevant metabolic routes for interactive network navigation and visualization , 2008, Bioinform..

[21]  R. Herges,et al.  Reaction Planning: Computer-Aided Discovery of a Novel Elimination Reaction , 1992, Science.

[22]  Edward S. Blurock,et al.  Detailed Mechanism Generation. 1. Generalized Reactive Properties as Reaction Class Substructures , 2004, J. Chem. Inf. Model..

[23]  James B. Hendrickson Systematic Signatures for Organic Reactions , 2010, J. Chem. Inf. Model..

[24]  Igor I. Baskin,et al.  SYMBEQ Program and Its Application in Computer-Assisted Reaction Design , 1994, J. Chem. Inf. Comput. Sci..

[25]  Guido Sello,et al.  Classification of organic reactions using similarity , 1997 .

[26]  John W. Raymond,et al.  An Automated Method for Exploring Targeted Substructural Diversity within Sets of Chemical Structures , 2005, J. Chem. Inf. Model..

[27]  James B. Hendrickson,et al.  Comprehensive System for Classification and Nomenclature of Organic Reactions , 1997, J. Chem. Inf. Comput. Sci..

[28]  J. F. Arens A formalism for the classification and design of organic reactions. I. The class of (− +)n reactions , 2010 .

[29]  Oliver Kohlbacher,et al.  Using Atom Mapping Rules for an Improved Detection of Relevant Routes in Weighted Metabolic Networks , 2008, J. Comput. Biol..

[30]  Peter Willett,et al.  Use of a maximum common subgraph algorithm in the automatic identification of ostensible bond changes occurring in chemical reactions , 1981, J. Chem. Inf. Comput. Sci..

[31]  N. Zefirov An approach to systematization and design of organic reactions , 1987 .

[32]  J. Gasteiger,et al.  Enabling the exploration of biochemical pathways. , 2004, Organic & biomolecular chemistry.

[33]  Guido Sello,et al.  Reaction classification by similarity: the influence of steric congestion , 1998 .

[34]  Peter Willett,et al.  Representing Clusters Using a Maximum Common Edge Substructure Algorithm Applied to Reduced Graphs and Molecular Graphs , 2007, J. Chem. Inf. Model..

[35]  James B. Hendrickson,et al.  Die Vielfalt thermischer pericyclischer Reaktionen , 1974 .

[36]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[37]  Edward S. Blurock Computer-aided synthesis design at RISC-Linz: automatic extraction and use of reaction classes , 1990, J. Chem. Inf. Comput. Sci..

[38]  Arthur Dalby,et al.  Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..

[39]  Kimito Funatsu,et al.  Automatic recognition of reaction site in organic chemical reactions , 1988 .

[40]  Edward S. Blurock,et al.  Detailed Mechanism Generation. 2. Aldehydes, Ketones, and Olefins , 2004, J. Chem. Inf. Model..

[41]  G. É. Vléduts,et al.  Concerning one system of classification and codification of organic reactions , 1963, Inf. Storage Retr..

[42]  Yang Liu,et al.  Route Designer: A Retrosynthetic Analysis Tool Utilizing Automated Retrosynthetic Rule Generation , 2009, J. Chem. Inf. Model..

[43]  Joannis Apostolakis,et al.  Automatic Determination of Reaction Mappings and Reaction Center Information. 1. The Imaginary Transition State Energy Approach , 2008, J. Chem. Inf. Model..

[44]  Shinsaku Fujita,et al.  Description of organic reactions based on imaginary transition structures. 1. Introduction of new concepts , 1986, J. Chem. Inf. Comput. Sci..

[45]  Michael F. Lynch,et al.  The Automatic Detection of Chemical Reaction Sites , 1978, J. Chem. Inf. Comput. Sci..

[46]  Johann Gasteiger,et al.  Classification of Organic Reactions: Similarity of Reactions Based on Changes in the Electronic Features of Oxygen Atoms at the Reaction Sites1 , 1998, J. Chem. Inf. Comput. Sci..

[47]  Juho Rousu,et al.  Computing Atom Mappings for Biochemical Reactions without Subgraph Isomorphism , 2011, J. Comput. Biol..

[48]  Rainer Herges,et al.  Coarctate transition states: the discovery of a reaction principle , 1994, J. Chem. Inf. Comput. Sci..

[49]  Kimito Funatsu,et al.  A Novel Method for Characterization of Three-Dimensional Reaction Fields Based on Electrostatic and Steric Interactions toward the Goal of Quantitative Analysis and Understanding of Organic Reactions , 1999, J. Chem. Inf. Comput. Sci..

[50]  David Bawden,et al.  Classification of chemical reactions: potential, possibilities and continuing relevance , 1991, J. Chem. Inf. Comput. Sci..

[51]  Kimito Funatsu,et al.  A Novel Approach to Retrosynthetic Analysis Using Knowledge Bases Derived from Reaction Databases , 1999, J. Chem. Inf. Comput. Sci..

[52]  Wendy A. Warr,et al.  Representation of chemical structures , 2011 .

[53]  Shinsaku Fujita Canonical numbering and coding of imaginary transition structures. A novel approach to the linear coding of individual organic reactions , 1988, J. Chem. Inf. Comput. Sci..

[54]  J. Gasteiger,et al.  Knowledge Discovery in Reaction Databases: Landscaping Organic Reactions by a Self-Organizing Neural Network , 1997 .

[55]  James Dugundji,et al.  An algebraic model of constitutional chemistry as a basis for chemical computer programs , 1973 .

[56]  Peter D. Karp,et al.  Accurate Atom-Mapping Computation for Biochemical Reactions , 2012, J. Chem. Inf. Model..

[57]  Peter Willett,et al.  Maximum common subgraph isomorphism algorithms for the matching of chemical structures , 2002, J. Comput. Aided Mol. Des..

[58]  Harald Mauser,et al.  Database Clustering with a Combination of Fingerprint and Maximum Common Substructure Methods , 2005, J. Chem. Inf. Model..