Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient?

As machine learning/artificial intelligence algorithms are defeating chess masters and, most recently, GO champions, there is interest – and hope – that they will prove equally useful in assisting chemists in predicting outcomes of organic reactions. This paper demonstrates, however, that the applicability of machine learning to the problems of chemical reactivity over diverse types of chemistries remains limited – in particular, with the currently available chemical descriptors, fundamental mathematical theorems impose upper bounds on the accuracy with which raction yields and times can be predicted. Improving the performance of machine-learning methods calls for the development of fundamentally new chemical descriptors.

[1]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Alán Aspuru-Guzik,et al.  Neural Networks for the Prediction of Organic Chemistry Reactions , 2016, ACS central science.

[4]  Daniel M. Lowe,et al.  Development of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and Similarity , 2015, J. Chem. Inf. Model..

[5]  Shivani Agarwal,et al.  Ranking Chemical Structures for Drug Discovery: A New Machine Learning Approach , 2010, J. Chem. Inf. Model..

[6]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[7]  D. Kell Metabolomics, modelling and machine learning in systems biology – towards an understanding of the languages of cells , 2006, The FEBS journal.

[8]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[9]  Kathleen Martin,et al.  The Learning Machines. , 1981 .

[10]  Piotr Dittwald,et al.  Computer-Assisted Synthetic Planning: The End of the Beginning. , 2016, Angewandte Chemie.

[11]  川野 秀一 An Introduction to Statistical Learning (with Applications in R), Gareth James,Daniela Witten,Trevor Hastie and Robert Tibshirani著, Springer, 2013年8月, 430pp., 価格 59.99〓, ISBN 978-1-4614-7137-0 , 2014 .

[12]  Jean-Philippe Vert,et al.  Machine Learning for In Silico Virtual Screening and Chemical Genomics: New Strategies , 2008, Combinatorial chemistry & high throughput screening.

[13]  Alán Aspuru-Guzik,et al.  The Harvard organic photovoltaic dataset , 2016, Scientific Data.

[14]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[15]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[16]  Bjoern H. Menze,et al.  A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data , 2009, BMC Bioinformatics.

[17]  Nicola Jones,et al.  Computer science: The learning machines , 2014, Nature.

[18]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[19]  Bruce C. Gibb,et al.  Big (chemistry) data. , 2013, Nature chemistry.

[20]  Ramón García-Domenech,et al.  Application of molecular topology for the prediction of the reaction times and yields under solvent-free conditions , 2010 .

[21]  Edward O. Pyzer-Knapp,et al.  Learning from the Harvard Clean Energy Project: The Use of Neural Networks to Accelerate Materials Discovery , 2015 .

[22]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[23]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[24]  T. Fukuyama,et al.  Total synthesis of (.+-.)-FR-900482 , 1992 .

[25]  Matjaz Kukar,et al.  Image processing and machine learning for fully automated probabilistic evaluation of medical images , 2011, Comput. Methods Programs Biomed..

[26]  Daniel Gillblad,et al.  Learning Machines , 2020, AAAI Spring Symposia.

[27]  Pierre Baldi,et al.  ReactionPredictor: Prediction of Complex Chemical Reactions at the Mechanistic Level Using Machine Learning , 2012, J. Chem. Inf. Model..

[28]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[29]  S. Danishefsky,et al.  Total Synthesis of (.+-.)-FR-900482 , 1995 .

[30]  Stefan Kramer,et al.  Predicting a small molecule-kinase interaction map: A machine learning approach , 2011, J. Cheminformatics.

[31]  Taeyoung Yoon,et al.  The Total Synthesis of Dynemicin A Leading to Development of a Fully Contained Bioreductively Activated Enediyne Prodrug , 1996 .

[32]  Jure Zupan,et al.  Neural networks in chemistry , 1993 .

[33]  Sahil R. Kalra,et al.  Big Challenges? Big Data … , 2015 .

[34]  Pierre Baldi,et al.  Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules , 2013, J. Chem. Inf. Model..

[35]  L. Overman Charge as a key component in reaction design. The invention of cationic cyclization reactions of importance in synthesis , 1992 .

[36]  Paul Raccuglia,et al.  Machine-learning-assisted materials discovery using failed experiments , 2016, Nature.

[37]  B. Grzybowski,et al.  A Priori Estimation of Organic Reaction Yields. , 2015, Angewandte Chemie.

[38]  Michael P. Wellman,et al.  Economic reasoning and artificial intelligence , 2015, Science.

[39]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[40]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[42]  Andrea Cadeddu,et al.  Organic chemistry as a language and the implications of chemical linguistics for structural and retrosynthetic analyses. , 2014, Angewandte Chemie.

[43]  R. García-Domenech,et al.  Application of Molecular Topology for the Prediction of Reaction Yields and Anti-Inflammatory Activity of Heterocyclic Amidine Derivatives , 2011, International journal of molecular sciences.

[44]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..