Learning To Predict Reaction Conditions: Relationships between Solvent, Molecular Structure, and Catalyst

Reaction databases provide a great deal of useful information to assist planning of experiments, but do not provide any interpretation or chemical concepts to accompany this information. In this work reactions are labeled with experimental conditions and network analysis shows that consistencies within clusters of data points can be leveraged to organize this information. In particular, this analysis shows how particular experimental conditions (specifically solvent) are effective in enabling specific organic reactions (Friedel-Crafts, Aldol addition, Claisen condensation, Diels-Alder, and Wittig), including variations within each reaction class. An example of network analysis is shown in the graphical abstract, where data points for a Claisen condensation reaction break into clusters that depend on the catalyst and chemical structure. This type of clustering, which mimics how a chemist reasons, is derived directly from the network. Therefore the findings of this work could augment synthesis planning by providing predictions in a fashion that mimics human chemists. To numerically evaluate solvent prediction ability, three methods are compared: network analysis (through the k-nearest neighbor algorithm), a support vector machine, and a deep neural network. The most accurate method in 4 of the 5 test cases is the network analysis, with deep neural networks also showing good prediction scores. The network analysis tool was evaluated by an expert panel of chemists, who generally agreed that the algorithm produced accurate solvent choices while simultaneously being transparent in the underlying reasons for its predictions.

[1]  D J Rogers,et al.  A Computer Program for Classifying Plants. , 1960, Science.

[2]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3]  E. Corey General methods for the construction of complex molecules , 1967 .

[4]  E J Corey,et al.  Computer-assisted design of complex organic syntheses. , 1969, Science.

[5]  J. E. Ash,et al.  Chemical Information Systems , 1975 .

[6]  David A. Pensak,et al.  LHASA—Logic and Heuristics Applied to Synthetic Analysis , 1977 .

[7]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[8]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[9]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[10]  W. L. Jorgensen,et al.  CAMEO: a program for the logical prediction of the products of organic reactions , 1990 .

[11]  Guido Sello,et al.  Reaction prediction: the suggestions of the Beppe program , 1992, J. Chem. Inf. Comput. Sci..

[12]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[13]  J. L. Durant,et al.  Reoptimization of MDL Keys for Use in Drug Discovery. , 2003 .

[14]  Andrew R. Leach,et al.  An Introduction to Chemoinformatics , 2003 .

[15]  J. A. Paixão,et al.  One-step synthesis of dipyrromethanes in water , 2003 .

[16]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[17]  Pierre Baldi,et al.  No Electron Left Behind: A Rule-Based Expert System To Predict Chemical Reactions and Reaction Mechanisms , 2009, J. Chem. Inf. Model..

[18]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[19]  H. Meshram,et al.  Triton B–Assisted, Efficient, and Convenient Synthesis of 3-Indolyl-3-hydroxy Oxindoles in Aqueous Medium , 2009 .

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  Pierre Baldi,et al.  Learning to Predict Chemical Reactions , 2011, J. Chem. Inf. Model..

[22]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[23]  Jianwei Xie,et al.  Sc(OTf)3: A Highly Efficient and Renewable Catalyst for Michael Addition of Indoles to Nitroolefins in Water. , 2011 .

[24]  M. P. Yeh,et al.  Synthesis of the phenanthrene and cyclohepta[a]naphthalene skeletons via gold(I)-catalyzed intramolecular cyclization of unactivated cyclic 5-(2-arylethyl)-1,3-dienes. , 2011, The Journal of organic chemistry.

[25]  Lawrence O. Hall,et al.  Label-noise reduction with support vector machines , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[26]  P. N. Chatterjee,et al.  Allylic activation across an Ir–Sn heterobimetallic catalyst: nucleophilic substitution and disproportionation of allylic alcohol , 2012 .

[27]  Samiran Hutait,et al.  Efficient Synthesis of Maxonine Analogues from N-Substituted Benzyl-1-formyl-9H-β-carbolines† , 2012 .

[28]  Biao Xu,et al.  Multistep one-pot synthesis of enantioenriched polysubstituted cyclopenta[b]indoles. , 2012, Angewandte Chemie.

[29]  B. Grzybowski,et al.  Parallel optimization of synthetic pathways within the network of organic chemistry. , 2012, Angewandte Chemie.

[30]  E. Mørkved,et al.  Thiophen-2-yl and bithienyl substituted pyrazine-2,3-dicarbonitriles as precursors for tetrasubstituted zinc azaphthalocyanines , 2013 .

[31]  C. Adjiman,et al.  Computer-aided molecular design of solvents for accelerated reaction kinetics. , 2013, Nature chemistry.

[32]  M. Jacomy,et al.  ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software , 2014, PloS one.

[33]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[34]  Károly Héberger,et al.  Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? , 2015, Journal of Cheminformatics.

[35]  Jürgen Bajorath,et al.  Visualization and Interpretation of Support Vector Machine Activity Predictions , 2015, J. Chem. Inf. Model..

[36]  Piotr Dittwald,et al.  Computer-Assisted Synthetic Planning: The End of the Beginning. , 2016, Angewandte Chemie.

[37]  Paul Raccuglia,et al.  Machine-learning-assisted materials discovery using failed experiments , 2016, Nature.

[38]  Alexandre Varnek,et al.  Automatized Assessment of Protective Group Reactivity: A Step Toward Big Reaction Data Analysis , 2016, J. Chem. Inf. Model..

[39]  Claudio Battilocchio,et al.  Enabling Technologies for the Future of Chemical Synthesis , 2016, ACS central science.

[40]  Jianwei Xie,et al.  Water-soluble (salicyladimine)2Cu complex as an efficient and renewable catalyst for Michael addition of indoles to nitroolefins in water , 2017 .

[41]  José L. Medina-Franco,et al.  Database fingerprint (DFP): an approach to represent molecular databases , 2017, Journal of Cheminformatics.

[42]  Xiaolan Xu,et al.  Catalytic Electrophilic Alkylation of p-Quinones through a Redox Chain Reaction. , 2017, Angewandte Chemie.

[43]  Xu Chen,et al.  Boosting Chemical Stability, Catalytic Activity, and Enantioselectivity of Metal-Organic Frameworks for Batch and Flow Reactions. , 2017, Journal of the American Chemical Society.

[44]  Marwin H. S. Segler,et al.  Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. , 2017, Chemistry.

[45]  Xiaolan Xu,et al.  Catalytic Electrophilic Alkylation of p-Quinones via a Redox Chain Reaction , 2017 .

[46]  Marwin H. S. Segler,et al.  Modelling Chemical Reasoning to Predict Reactions , 2016, Chemistry.

[47]  Regina Barzilay,et al.  Prediction of Organic Reaction Outcomes Using Machine Learning , 2017, ACS central science.

[48]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[49]  Sara Szymkuć,et al.  Chematica: A Story of Computer Code That Started to Think like a Chemist , 2018 .

[50]  Geoffrey J. Gordon,et al.  Constant size descriptors for accurate machine learning models of molecular properties. , 2018, The Journal of chemical physics.

[51]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[52]  William H. Green,et al.  Using Machine Learning To Predict Suitable Conditions for Organic Reactions , 2018, ACS central science.

[53]  Gregor Urban,et al.  Deep learning for chemical reaction prediction , 2018 .

[54]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[55]  Molecular Properties , 2019, Molecular Interactions.