A Structure-Based Platform for Predicting Chemical Reactivity

Summary Despite their enormous potential, machine learning methods have only found limited application in predicting reaction outcomes, because current models are often highly complex and, most importantly, are not transferable to different problem sets. Here, we present a structure-based machine learning platform for diverse applications in organic chemistry. Therefore, an input based on multiple fingerprint features (MFFs) as a versatile molecular representation was developed that was shown to be applicable over a range of diverse problem sets. First, molecular properties across a diverse array of molecules could be predicted accurately. Next, reaction outcomes such as stereoselectivities and yields were predicted for experimental datasets that were previously evaluated using (complex) problem-oriented descriptor models. As a final application, a systematic high-throughput dataset was investigated as a “real-world problem,” and good correlation was observed when using the structure-based model.

[1]  Adrià Cereto-Massagué,et al.  Molecular fingerprint similarity search in virtual screening. , 2015, Methods.

[2]  I. Markó The Art of Total Synthesis , 2001, Science.

[3]  Derek T. Ahneman,et al.  Predicting reaction performance in C–N cross-coupling using machine learning , 2018, Science.

[4]  Robert P Sheridan,et al.  Response to Comment on “Predicting reaction performance in C–N cross-coupling using machine learning” , 2018, Science.

[5]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[6]  Jonathan D Hirst,et al.  Machine learning in virtual screening. , 2009, Combinatorial chemistry & high throughput screening.

[7]  Paul Richardson,et al.  A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow , 2018, Science.

[8]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[9]  Zois Boukouvalas,et al.  Deep learning for molecular generation and optimization - a review of the state of the art , 2019, Molecular Systems Design & Engineering.

[10]  Anat Milo,et al.  Interrogating selectivity in catalysis using molecular vibrations , 2014, Nature.

[11]  John B. O. Mitchell Machine learning methods in chemoinformatics , 2014, Wiley interdisciplinary reviews. Computational molecular science.

[12]  Matthew S Sigman,et al.  Predictive and mechanistic multivariate linear regression models for reaction development , 2018, Chemical science.

[13]  F. Jensen Introduction to Computational Chemistry , 1998 .

[14]  Matthew S. Sigman,et al.  Relationships Guides Asymmetric Propargylation Three-Dimensional Correlation of Steric and Electronic Free Energy , 2014 .

[15]  F. Dean Toste,et al.  A data-intensive approach to mechanistic elucidation applied to chiral anion catalysis , 2015, Science.

[16]  Jolene P Reid,et al.  Holistic Prediction of Enantioselectivity in Asymmetric Catalysis , 2019, Nature.

[17]  Alán Aspuru-Guzik,et al.  Neural Networks for the Prediction of Organic Chemistry Reactions , 2016, ACS central science.

[18]  Connor W. Coley,et al.  Machine Learning in Computer-Aided Synthesis Planning. , 2018, Accounts of chemical research.

[19]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[20]  Lazaros Mavridis,et al.  Comprehensive Comparison of Ligand-Based Virtual Screening Tools Against the DUD Data set Reveals Limitations of Current 3D Methods , 2010, J. Chem. Inf. Model..

[21]  Klavs F Jensen,et al.  Reconfigurable system for automated optimization of diverse chemical reactions , 2018, Science.

[22]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[23]  Scott E Denmark,et al.  A systematic investigation of quaternary ammonium ions as asymmetric phase-transfer catalysts. Application of quantitative structure activity/selectivity relationships. , 2011, The Journal of organic chemistry.

[24]  Michael J. Keiser,et al.  Comment on “Predicting reaction performance in C–N cross-coupling using machine learning” , 2018, Science.

[25]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[26]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[27]  I. Davies The digitization of organic synthesis , 2019, Nature.

[28]  Benjamin L. Miller,et al.  Synthesis at the molecular frontier , 2009, Nature.

[29]  Regina Barzilay,et al.  Prediction of Organic Reaction Outcomes Using Machine Learning , 2017, ACS central science.

[30]  Yang Wang,et al.  Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning , 2019, Science.

[31]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[32]  Bartosz A Grzybowski,et al.  Prediction of Major Regio-, Site-, and Diastereoisomers in Diels-Alder Reactions by Using Machine-Learning: The Importance of Physically Meaningful Descriptors. , 2018, Angewandte Chemie.

[33]  Antonio Lavecchia,et al.  Machine-learning approaches in drug discovery: methods and applications. , 2015, Drug discovery today.

[34]  Anton J. Hopfinger,et al.  4D-Fingerprints, Universal QSAR and QSPR Descriptors , 2004, J. Chem. Inf. Model..

[35]  Pierre Baldi,et al.  Learning to Predict Chemical Reactions , 2011, J. Chem. Inf. Model..

[36]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[37]  Connor W. Coley,et al.  A graph-convolutional neural network model for the prediction of chemical reactivity , 2018, Chemical science.

[38]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[39]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[40]  A. Gambin,et al.  Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient? , 2017, Scientific Reports.

[41]  Noel M. O'Boyle,et al.  DeepSMILES: An Adaptation of SMILES for Use in Machine-Learning of Chemical Structures , 2018 .

[42]  Bowen Liu,et al.  Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models , 2017, ACS central science.

[43]  Anat Milo,et al.  The Development of Multidimensional Analysis Tools for Asymmetric Catalysis and Beyond. , 2016, Accounts of chemical research.

[44]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[45]  Russ B Altman,et al.  Machine learning in chemoinformatics and drug discovery. , 2018, Drug discovery today.

[46]  Qin Tong,et al.  Molecular fingerprint-based artificial neural networks QSAR for ligand biological activity predictions. , 2012, Molecular pharmaceutics.

[47]  Leroy Cronin,et al.  Controlling an organic synthesis robot with machine learning to search for new reactivity , 2018, Nature.

[48]  Ruifeng Liu,et al.  Using Molecular Fingerprint as Descriptors in the QSPR Study of Lipophilicity , 2008, J. Chem. Inf. Model..

[49]  B. Grzybowski,et al.  Rapid and Accurate Prediction of pKa Values of C-H Acids Using Graph Convolutional Neural Networks. , 2019, Journal of the American Chemical Society.

[50]  D. Bojanic,et al.  Impact of high-throughput screening in biomedical research , 2011, Nature Reviews Drug Discovery.

[51]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[52]  Paul Raccuglia,et al.  Machine-learning-assisted materials discovery using failed experiments , 2016, Nature.

[53]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[54]  Daniel C Elton,et al.  Applying machine learning techniques to predict the properties of energetic materials , 2018, Scientific Reports.

[55]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[56]  Kevin Bateman,et al.  Nanomole-scale high-throughput chemistry for the synthesis of complex molecules , 2015, Science.

[57]  Frank Glorius,et al.  A robustness screen for the rapid assessment of chemical reactions , 2013, Nature Chemistry.

[58]  E. N. Bess,et al.  Designer substrate library for quantitative, predictive modeling of reaction performance , 2014, Proceedings of the National Academy of Sciences.

[59]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.