A Short Review of Chemical Reaction Database Systems, Computer‐Aided Synthesis Design, Reaction Prediction and Synthetic Feasibility

This article is the text for a pedagogical lecture to be given at the Strasbourg Summer School in Chemoinformatics in June 2104. It covers a very wide range of reaction topics including structure and reaction representation, reaction centers, atom‐to‐atom mapping, reaction retrieval systems, computer‐aided synthesis design, retrosynthesis, reaction prediction and synthetic feasibility. In the time available the coverage of each topic can only be cursory; the main usefulness of this article to the research community is the extensive bibliography.

[1]  Clara D. Christ,et al.  Mining Electronic Laboratory Notebooks: Analysis, Retrosynthesis, and Reaction Based Enumeration , 2012, J. Chem. Inf. Model..

[2]  Guenter Grethe,et al.  Algorithm for Reaction Classification , 2013, J. Chem. Inf. Model..

[3]  W. L. Jorgensen,et al.  Computer-assisted mechanistic evaluation of organic reactions. 1. Overview , 1980 .

[4]  Kimito Funatsu,et al.  SOPHIA, a Knowledge Base-Guided Reaction Prediction System - Utilization of a Knowledge Base Derived from a Reaction Database , 1995, J. Chem. Inf. Comput. Sci..

[5]  Pierre Baldi,et al.  No Electron Left Behind: A Rule-Based Expert System To Predict Chemical Reactions and Reaction Mechanisms , 2009, J. Chem. Inf. Model..

[6]  James B. Hendrickson,et al.  COGNOS: A Beilstein-Type System for Organizing Organic Reactions , 1995, J. Chem. Inf. Comput. Sci..

[7]  Alexandre Varnek,et al.  Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures , 2005, J. Comput. Aided Mol. Des..

[8]  Johann Gasteiger,et al.  Automatic Extraction of Chemical Knowledge from Organic Reaction Data: Addition of Carbon-Hydrogen Bonds to Carbon-Carbon Double Bonds , 1995 .

[9]  David Z. Chen,et al.  Automatic reaction mapping and reaction center detection , 2013 .

[10]  Valerie J. Gillet,et al.  SPROUT: 3D Structure Generation Using Templates , 1995, J. Chem. Inf. Comput. Sci..

[11]  N. Zefirov An approach to systematization and design of organic reactions , 1987 .

[12]  J. Gasteiger,et al.  Enabling the exploration of biochemical pathways. , 2004, Organic & biomolecular chemistry.

[13]  Guido Sello,et al.  Reaction classification by similarity: the influence of steric congestion , 1998 .

[14]  Johann Gasteiger,et al.  EROS A computer program for generating sequences of reactions , 1978 .

[15]  W. Todd Wipke,et al.  Artificial intelligence in organic synthesis. SST: starting material selection strategies. An application of superstructure search , 1984, J. Chem. Inf. Comput. Sci..

[16]  B. Grzybowski,et al.  Parallel optimization of synthetic pathways within the network of organic chemistry. , 2012, Angewandte Chemie.

[17]  Johann Gasteiger,et al.  Simulation of Organic Reactions: From the Degradation of Chemicals to Combinatorial Synthesis , 2000, J. Chem. Inf. Comput. Sci..

[18]  Andreas Barth,et al.  Status and future developments of reaction databases and online retrieval systems , 1990, J. Chem. Inf. Comput. Sci..

[19]  William Lingran Chen,et al.  Over 20 Years of Reaction Access Systems from MDL: A Novel Reaction Substructure Search Algorithm , 2002, J. Chem. Inf. Comput. Sci..

[20]  Ivar Ugi,et al.  Computer assistance in the design of syntheses and a new generation of computer programs for the solution of chemical problems by molecular logic , 1988 .

[21]  Arthur Dalby,et al.  Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..

[22]  Matthew H Todd,et al.  Computer-aided organic synthesis. , 2005, Chemical Society reviews.

[23]  Edward S. Blurock,et al.  Reaction: System for Modeling Chemical Reactions , 1995, J. Chem. Inf. Comput. Sci..

[24]  Johann Gasteiger,et al.  HORACE: An automatic system for the hierarchical classification of chemical reactions , 1994, Journal of chemical information and computer sciences.

[25]  Serge S. Tratch,et al.  Symbolic equations and their applications to reaction design , 1991 .

[26]  Valerie J. Gillet,et al.  SPROUT: Recent developments in the de novo design of molecules , 1994, J. Chem. Inf. Comput. Sci..

[27]  Peter Willett,et al.  Maximum common subgraph isomorphism algorithms for the matching of chemical structures , 2002, J. Comput. Aided Mol. Des..

[28]  Matthias Rarey,et al.  Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review , 2011 .

[29]  P. Judson,et al.  Knowledge-Based Expert Systems in Chemistry , 2009 .

[30]  Kimito Funatsu,et al.  Automatic recognition of reaction site in organic chemical reactions , 1988 .

[31]  James B. Hendrickson,et al.  The Variety of Thermal Pericyclic Reactions , 1974 .

[32]  S. Krishnan,et al.  Simulation and Evaluation of Chemical Synthesis - SECS: An Application of Artificial Intelligence Techniques , 1978, Artif. Intell..

[33]  J. Gasteiger,et al.  Automated derivation of reaction rules for the EROS 6.0 system for reaction prediction , 1990 .

[34]  René Barone,et al.  Computer‐Assisted Synthesis Design (CASD) , 2008 .

[35]  Johann Gasteiger,et al.  Automatic Determination of Reaction Mappings and Reaction Center Information. 2. Validation on a Biochemical Reaction Database , 2008, J. Chem. Inf. Model..

[36]  Pierre Baldi,et al.  ReactionMap: An Efficient Atom-Mapping Algorithm for Chemical Reactions , 2013, J. Chem. Inf. Model..

[37]  Gilles Marcou,et al.  Mining Chemical Reactions Using Neighborhood Behavior and Condensed Graphs of Reactions Approaches , 2012, J. Chem. Inf. Model..

[38]  Peter Willett,et al.  The Evaluation of an Automatically Indexed, Machine-Readable Chemical Reactions File , 1980, Journal of chemical information and computer sciences.

[39]  Peter Willett,et al.  Modern approaches to chemical reaction searching : proceedings of a conference , 1986 .

[40]  G. Tozer-Hotchkiss Theilheimer's Synthetic Methods of Organic Chemistry , 2011 .

[41]  Pascal Bonnet,et al.  Is chemical synthetic accessibility computationally predictable for drug and lead-like molecules? A comparative assessment between medicinal and computational chemists. , 2012, European journal of medicinal chemistry.

[42]  J. B. Hendrickson,et al.  Systematic characterization of structures and reactions for use in organic synthesis , 1971 .

[43]  G. Smith,et al.  SECS—Simulation and Evaluation of Chemical Synthesis: Strategy and Planning , 1977 .

[44]  Tatsuya Akutsu,et al.  Efficient extraction of mapping rules of atoms from enzymatic reaction data. , 2004 .

[45]  Johann Gasteiger,et al.  A Collection of Computer Methods for Synthesis Design and Reaction Prediction , 2010 .

[46]  Ivar Ugi,et al.  Interactive generation of organic reactions by IGOR 2 and the PC-assisted discovery of a new reaction , 1988 .

[47]  John M. Barnard,et al.  Substructure searching methods: Old and new , 1993, J. Chem. Inf. Comput. Sci..

[48]  Johann Gasteiger,et al.  Structure and reaction based evaluation of synthetic accessibility , 2007, J. Comput. Aided Mol. Des..

[49]  Dragos Horvath,et al.  Models for Identification of Erroneous Atom-to-Atom Mapping of Reactions Performed by Automated Algorithms , 2012, J. Chem. Inf. Model..

[50]  Guenter Grethe,et al.  International chemical identifier for reactions (RInChI) , 2013, Journal of Cheminformatics.

[51]  Dinesh P. Mehta,et al.  An Open-Source Java Platform for Automated Reaction Mapping , 2010, J. Chem. Inf. Model..

[52]  Philip Judson,et al.  Knowledge-based expert systems in chemistry : not counting on computers , 2009 .

[53]  Shinsaku Fujita Canonical numbering and coding of reaction center graphs and reduced reaction center graphs abstracted from imaginary transition structures. A novel approach to the linear coding of reaction types , 1988, J. Chem. Inf. Comput. Sci..

[54]  Dinesh P. Mehta,et al.  Automated reaction mapping , 2009, JEAL.

[55]  Alexander J. Lawson,et al.  The Beilstein Database , 2008 .

[56]  Rainer Herges,et al.  Reaction planning: Computer-aided reaction design , 1988 .

[57]  W T Wipke,et al.  Automatic knowledge base building for the organic synthesis design program (SECS). , 1989, Progress in clinical and biological research.

[58]  Johann Gasteiger,et al.  De novo design and synthetic accessibility , 2007, J. Comput. Aided Mol. Des..

[59]  Valerie J. Gillet,et al.  SPROUT, HIPPO and CAESA: Tools for de novo structure generation and estimation of synthetic accessibility , 1995 .

[60]  Jonathan Goodman,et al.  Computer Software Review: Reaxys , 2009, J. Chem. Inf. Model..

[61]  Rainer Herges,et al.  Computer-assisted solution of chemical problems : the historical development and the present state of the art of a new discipline of chemistry , 1993 .

[62]  Johann Gasteiger,et al.  COMPUTER-ASSISTED DESIGN OF SYNTHESES FOR HETEROCYCLIC COMPOUNDS , 1995 .

[63]  William L. Jorgensen,et al.  Computer Assisted Mechanistic Evaluations of Organic Reactions. 26. Diastereoselective Additions: Cram's Rule , 1995 .

[64]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[65]  Martin A. Ott,et al.  Computer tools for reaction retrieval and synthesis planning in organic chemistry. A brief review of their history, methods, and programs , 1992 .

[66]  Chyouhwa Chen,et al.  Building and refining a knowledge base for synthetic organic chemistry via the methodology of inductive and deductive machine learning , 1990, J. Chem. Inf. Comput. Sci..

[67]  James Dugundji,et al.  An algebraic model of constitutional chemistry as a basis for chemical computer programs , 1973 .

[68]  Tudor I. Oprea,et al.  Rapid Evaluation of Synthetic and Molecular Complexity for in Silico Chemistry , 2005, J. Chem. Inf. Model..

[69]  Lingran Chen,et al.  Reaction Classification and Knowledge Acquisition , 2008 .

[70]  Peter D. Karp,et al.  Accurate Atom-Mapping Computation for Biochemical Reactions , 2012, J. Chem. Inf. Model..

[71]  Juho Rousu,et al.  Computing Atom Mappings for Biochemical Reactions without Subgraph Isomorphism , 2011, J. Comput. Biol..

[72]  J. Gasteiger,et al.  Knowledge Discovery in Reaction Databases: Landscaping Organic Reactions by a Self-Organizing Neural Network , 1997 .

[73]  Christodoulos A. Floudas,et al.  Stereochemically Consistent Reaction Mapping and Identification of Multiple Reaction Mechanisms through Integer Linear Optimization , 2012, J. Chem. Inf. Model..

[74]  E J Corey,et al.  Computer-assisted design of complex organic syntheses. , 1969, Science.

[75]  Rainer Herges Organizing Principle of Complex Reactions and Theory of Coarctate Transition States , 1994 .

[76]  Stephen R. Heller,et al.  InChI - the worldwide chemical structure identifier standard , 2013, Journal of Cheminformatics.

[77]  Igor I. Baskin,et al.  SYMBEQ Program and Its Application in Computer-Assisted Reaction Design , 1994, J. Chem. Inf. Comput. Sci..

[78]  Kimito Funatsu,et al.  A Novel Method for Characterization of Three-Dimensional Reaction Fields Based on Electrostatic and Steric Interactions toward the Goal of Quantitative Analysis and Understanding of Organic Reactions , 1999, J. Chem. Inf. Comput. Sci..

[79]  J. Gasteiger,et al.  Computer-assisted reaction prediction and synthesis design , 1990 .

[80]  Johann Gasteiger,et al.  Classification of Organic Reactions: Similarity of Reactions Based on Changes in the Electronic Features of Oxygen Atoms at the Reaction Sites1 , 1998, J. Chem. Inf. Comput. Sci..

[81]  Guido Sello,et al.  Classification of organic reactions using similarity , 1997 .

[82]  Shinsaku Fujita Canonical numbering and coding of imaginary transition structures. A novel approach to the linear coding of individual organic reactions , 1988, J. Chem. Inf. Comput. Sci..

[83]  James B. Hendrickson,et al.  Comprehensive System for Classification and Nomenclature of Organic Reactions , 1997, J. Chem. Inf. Comput. Sci..

[84]  D. Horvath,et al.  ISIDA Property‐Labelled Fragment Descriptors , 2010, Molecular informatics.

[85]  J. F. Arens A formalism for the classification and design of organic reactions. I. The class of (− +)n reactions , 2010 .

[86]  Engelbert Zass Databases of Chemical Reactions , 2008 .

[87]  Peter F. Stadler,et al.  A Graph-Based Toy Model of Chemistry , 2003, J. Chem. Inf. Comput. Sci..

[88]  Keith T. Taylor,et al.  ROBIA: a reaction prediction program. , 2005, Organic letters.

[89]  Johann Gasteiger,et al.  Computer‐Assisted Planning of Organic Syntheses: The Second Generation of Programs , 1996 .

[90]  J Gasteiger,et al.  Decision support systems for chemical structure representation, reaction modeling, and spectra simulation , 2002, SAR and QSAR in environmental research.

[91]  Edward S. Blurock,et al.  Detailed Mechanism Generation. 2. Aldehydes, Ketones, and Olefins , 2004, J. Chem. Inf. Model..

[92]  G. É. Vléduts,et al.  Concerning one system of classification and codification of organic reactions , 1963, Inf. Storage Retr..

[93]  Pierre Baldi,et al.  ReactionPredictor: Prediction of Complex Chemical Reactions at the Mechanistic Level Using Machine Learning , 2012, J. Chem. Inf. Model..

[94]  Yang Liu,et al.  Route Designer: A Retrosynthetic Analysis Tool Utilizing Automated Retrosynthetic Rule Generation , 2009, J. Chem. Inf. Model..

[95]  J C Baber,et al.  Predicting synthetic accessibility: application in drug discovery and development. , 2004, Mini reviews in medicinal chemistry.

[96]  Johannes M. Bauer IGOR2: a PC-program for generating new reactions and molecular structures , 1989 .

[97]  Wendy A. Warr,et al.  Representation of chemical structures , 2011 .

[98]  Bilge Baytekin,et al.  Estimating chemical reactivity and cross-influence from collective chemical knowledge , 2012 .

[99]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[100]  Rainer Herges Reaction planning: prediction of new organic reactions , 1990, J. Chem. Inf. Comput. Sci..

[101]  William Lingran Chen,et al.  Self-Contained Sequence Representation: Bridging the Gap between Bioinformatics and Cheminformatics , 2011, J. Chem. Inf. Model..

[102]  A. Johnson,et al.  Molecular complexity analysis of de novo designed ligands. , 2006, Journal of medicinal chemistry.

[103]  Shinsaku Fujita,et al.  Description of organic reactions based on imaginary transition structures. 1. Introduction of new concepts , 1986, J. Chem. Inf. Comput. Sci..

[104]  Michael F. Lynch,et al.  The Automatic Detection of Chemical Reaction Sites , 1978, J. Chem. Inf. Comput. Sci..

[105]  J Gasteiger,et al.  A combined application of reaction prediction and infrared spectra simulation for the identification of degradation products of s-triazine herbicides. , 2001, Chemistry.

[106]  Oliver Kohlbacher,et al.  Using Atom Mapping Rules for an Improved Detection of Relevant Routes in Weighted Metabolic Networks , 2008, J. Comput. Biol..

[107]  Peter Willett,et al.  Use of a maximum common subgraph algorithm in the automatic identification of ostensible bond changes occurring in chemical reactions , 1981, J. Chem. Inf. Comput. Sci..

[108]  David Bawden,et al.  Classification of chemical reactions: potential, possibilities and continuing relevance , 1991, J. Chem. Inf. Comput. Sci..

[109]  Kimito Funatsu,et al.  A Novel Approach to Retrosynthetic Analysis Using Knowledge Bases Derived from Reaction Databases , 1999, J. Chem. Inf. Comput. Sci..

[110]  Guido Sello,et al.  Reaction prediction: the suggestions of the Beppe program , 1992, J. Chem. Inf. Comput. Sci..

[111]  Anthony P. F. Cook,et al.  Computer‐aided synthesis design: 40 years on , 2012 .

[112]  Pierre Baldi,et al.  Learning to Predict Chemical Reactions , 2011, J. Chem. Inf. Model..

[113]  Stephen R. Heller,et al.  Current Status and Future Development in Relation to IUPAC Activities , 2013 .

[114]  Oliver Kohlbacher,et al.  MetaRoute: fast search for relevant metabolic routes for interactive network navigation and visualization , 2008, Bioinform..

[115]  Edward S. Blurock,et al.  Detailed Mechanism Generation. 1. Generalized Reactive Properties as Reaction Class Substructures , 2004, J. Chem. Inf. Model..

[116]  Alexandre Varnek,et al.  Stochastic versus Stepwise Strategies for Quantitative Structure-Activity Relationship GenerationHow Much Effort May the Mining for Successful QSAR Models Take? , 2007, J. Chem. Inf. Model..

[117]  W. L. Jorgensen,et al.  CAMEO: a program for the logical prediction of the products of organic reactions , 1990 .

[118]  Edward S. Blurock Computer-aided synthesis design at RISC-Linz: automatic extraction and use of reaction classes , 1990, J. Chem. Inf. Comput. Sci..

[119]  Joannis Apostolakis,et al.  Automatic Determination of Reaction Mappings and Reaction Center Information. 1. The Imaginary Transition State Energy Approach , 2008, J. Chem. Inf. Model..

[120]  P. Baldi,et al.  Synthesis Explorer : A Chemical Reaction Tutorial System for Organic Synthesis Design and Mechanism Prediction , 2008 .

[121]  James E. Blake,et al.  CASREACT: more than a million reactions , 1990, J. Chem. Inf. Comput. Sci..

[122]  James B. Hendrickson Systematic Signatures for Organic Reactions , 2010, J. Chem. Inf. Model..

[123]  Guenter Grethe,et al.  Analysis of Reaction Information , 2008 .