Predicting reaction performance in C–N cross-coupling using machine learning

A guide for catalyst choice in the forest Chemists often discover reactions by applying catalysts to a series of simple compounds. Tweaking those reactions to tolerate more structural complexity in pharmaceutical research is time-consuming. Ahneman et al. report that machine learning can help. Using a high-throughput data set, they trained a random forest algorithm to predict which specific palladium catalysts would best tolerate isoxazoles (cyclic structures with an N–O bond) during C–N bond formation. The predictions also helped to guide analysis of the catalyst inhibition mechanism. Science, this issue p. 186 A random forest algorithm trained on high-throughput data predicts which catalysts best tolerate certain heterocycles. Machine learning methods are becoming integral to scientific inquiry in numerous disciplines. We demonstrated that machine learning can be used to predict the performance of a synthetic reaction in multidimensional chemical space using data obtained via high-throughput experimentation. We created scripts to compute and extract atomic, molecular, and vibrational descriptors for the components of a palladium-catalyzed Buchwald-Hartwig cross-coupling of aryl halides with 4-methylaniline in the presence of various potentially inhibitory additives. Using these descriptors as inputs and reaction yield as output, we showed that a random forest algorithm provides significantly improved predictive performance over linear regression analysis. The random forest model was also successfully applied to sparse training sets and out-of-sample prediction, suggesting its value in facilitating adoption of synthetic methodology.

[1]  Scott G. Stewart,et al.  Stable Nickel(0) Phosphites as Catalysts for C ? N Cross‐Coupling Reactions , 2014 .

[2]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[3]  Henri Doucet,et al.  Ligand‐Free‐Palladium‐Catalyzed Direct 4‐Arylation of Isoxazoles Using Aryl Bromides , 2009 .

[4]  N. Draper,et al.  Applied Regression Analysis , 1967 .

[5]  Piotr Dittwald,et al.  Computer-Assisted Synthetic Planning: The End of the Beginning. , 2016, Angewandte Chemie.

[6]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[7]  Matthew H Todd,et al.  Computer-aided organic synthesis. , 2005, Chemical Society reviews.

[8]  E. N. Bess,et al.  Designer substrate library for quantitative, predictive modeling of reaction performance , 2014, Proceedings of the National Academy of Sciences.

[9]  Hiroaki Sasai,et al.  Palladium-Catalyzed Direct C-H Arylation of Isoxazoles at the 5-Position. , 2015, Angewandte Chemie.

[10]  Kevin Bateman,et al.  Nanomole-scale high-throughput chemistry for the synthesis of complex molecules , 2015, Science.

[11]  Bing Li,et al.  Chemistry informer libraries: a chemoinformatics enabled approach to evaluate and advance synthetic methods† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c5sc04751j , 2016, Chemical science.

[12]  S. Geer,et al.  Regularization in statistics , 2006 .

[13]  J. Hartwig,et al.  Palladium-catalyzed amination of aromatic C-H bonds with oxime esters. , 2010, Journal of the American Chemical Society.

[14]  Alán Aspuru-Guzik,et al.  Neural Networks for the Prediction of Organic Chemistry Reactions , 2016, ACS central science.

[15]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[16]  J. T. Njardarson,et al.  Analysis of the structural diversity, substitution patterns, and frequency of nitrogen heterocycles among U.S. FDA approved pharmaceuticals. , 2014, Journal of medicinal chemistry.

[17]  A. Gambin,et al.  Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient? , 2017, Scientific Reports.

[18]  Yan Xiong,et al.  Synthesis of benzidine derivatives via FeCl3·6H2O-promoted oxidative coupling of anilines. , 2013, The Journal of organic chemistry.

[19]  F. Dean Toste,et al.  A data-intensive approach to mechanistic elucidation applied to chiral anion catalysis , 2015, Science.

[20]  K. Omata Screening of New Additives of Active-Carbon-Supported Heteropoly Acid Catalyst for Friedel-Crafts Reaction by Gaussian Process Regression , 2011 .

[21]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[22]  Tadashi Hattori,et al.  Estimation of catalytic performance by neural network — product distribution in oxidative dehydrogenation of ethylbenzene , 1994 .

[23]  Regina Barzilay,et al.  Prediction of Organic Reaction Outcomes Using Machine Learning , 2017, ACS central science.

[24]  L. Hammett The Effect of Structure upon the Reactions of Organic Compounds. Benzene Derivatives , 1937 .

[25]  Paul Raccuglia,et al.  Machine-learning-assisted materials discovery using failed experiments , 2016, Nature.

[26]  Antonio Lavecchia,et al.  Machine-learning approaches in drug discovery: methods and applications. , 2015, Drug discovery today.

[27]  Bowen Liu,et al.  Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models , 2017, ACS central science.

[28]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[29]  Yu Lan,et al.  Anthranil: An Aminating Reagent Leading to Bifunctionality for Both C(sp(3) )-H and C(sp(2) )-H under Rhodium(III) Catalysis. , 2016, Angewandte Chemie.

[30]  Frank Glorius,et al.  Contemporary screening approaches to reaction discovery and development. , 2014, Nature chemistry.

[31]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[32]  Anat Milo,et al.  Interrogating selectivity in catalysis using molecular vibrations , 2014, Nature.

[33]  Anat Milo,et al.  The Development of Multidimensional Analysis Tools for Asymmetric Catalysis and Beyond. , 2016, Accounts of chemical research.

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  Pierre Baldi,et al.  Learning to Predict Chemical Reactions , 2011, J. Chem. Inf. Model..

[36]  B Bang-Andersen,et al.  Structural determinants of AMPA agonist activity in analogues of 2-amino-3-(3-carboxy-5-methyl-4-isoxazolyl)propionic acid: synthesis and pharmacology. , 2000, Journal of medicinal chemistry.

[37]  Scott E Denmark,et al.  A systematic investigation of quaternary ammonium ions as asymmetric phase-transfer catalysts. Application of quantitative structure activity/selectivity relationships. , 2011, The Journal of organic chemistry.

[38]  Maykel Cruz-Monteagudo,et al.  Activity cliffs in drug discovery: Dr Jekyll or Mr Hyde? , 2014, Drug discovery today.

[39]  Frank Glorius,et al.  Intermolecular reaction screening as a tool for reaction evaluation. , 2015, Accounts of chemical research.

[40]  Stephen L. Buchwald,et al.  Applications of Palladium-Catalyzed C–N Cross-Coupling Reactions , 2016, Chemical reviews.

[41]  M. Shahlaei Descriptor selection methods in quantitative structure-activity relationship studies: a review study. , 2013, Chemical reviews.

[42]  Bing Li,et al.  Photocatalytic Cross-Dehydrogenative Amination Reactions between Phenols and Diarylamines , 2017 .

[43]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.