Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning

Predicting catalyst selectivity Asymmetric catalysis is widely used in chemical research and manufacturing to access just one of two possible mirror-image products. Nonetheless, the process of tuning catalyst structure to optimize selectivity is still largely empirical. Zahrt et al. present a framework for more efficient, predictive optimization. As a proof of principle, they focused on a known coupling reaction of imines and thiols catalyzed by chiral phosphoric acid compounds. By modeling multiple conformations of more than 800 prospective catalysts, and then training machine-learning algorithms on a subset of experimental results, they achieved highly accurate predictions of enantioselectivities. Science, this issue p. eaau5631 A model encompassing multiple conformations of chiral phosphoric acid catalysts accurately predicts enantioselectivities. INTRODUCTION The development of new synthetic methods in organic chemistry is traditionally accomplished through empirical optimization. Catalyst design, wherein experimentalists attempt to qualitatively identify correlations between catalyst structure and catalyst efficiency, is no exception. However, this approach is plagued by numerous deficiencies, including the lack of mechanistic understanding of a new transformation, the inherent limitations of human cognitive abilities to find patterns in large collections of data, and the lack of quantitative guidelines to aid catalyst identification. Chemoinformatics provides an attractive alternative to empiricism for several reasons: Mechanistic information is not a prerequisite, catalyst structures can be characterized by three-dimensional (3D) descriptors (numerical representations of molecular properties derived from the 3D molecular structure) that quantify the steric and electronic properties of thousands of candidate molecules, and the suitability of a given catalyst candidate can be quantified by comparing its properties with a computationally derived model trained on experimental data. The ability to accurately predict a selective catalyst by using a set of less than optimal data remains a major goal for machine learning with respect to asymmetric catalysis. We report a method to achieve this goal and propose a more efficient alternative to traditional catalyst design. RATIONALE The workflow we have created consists of the following components: (i) construction of an in silico library comprising a large collection of conceivable, synthetically accessible catalysts derived from a particular scaffold; (ii) calculation of relevant chemical descriptors for each scaffold; (iii) selection of a representative subset of the catalysts [this subset is termed the universal training set (UTS) because it is agnostic to reaction or mechanism and thus can be used to optimize any reaction catalyzed by that scaffold]; (iv) collection of the training data; and (v) application of machine learning methods to generate models that predict the enantioselectivity of each member of the in silico library. These models are evaluated with an external test set of catalysts (predicting selectivities of catalysts outside of the training data). The validated models can then be used to select the optimal catalyst for a given reaction. RESULTS To demonstrate the viability of our method, we predicted reaction outcomes with substrate combinations and catalysts different from the training data and simulated a situation in which highly selective reactions had not been achieved. In the first demonstration, a model was constructed by using support vector machines and validated with three different external test sets. The first test set evaluated the ability of the model to predict the selectivity of only reactions forming new products with catalysts from the training set. The model performed well, with a mean absolute deviation (MAD) of 0.161 kcal/mol. Next, the same model was used to predict the selectivity of an external test set of catalysts with substrate combinations from the training set. The performance of the model was still highly accurate, with a MAD of 0.211 kcal/mol. Lastly, reactions forming new products with the external test catalysts were predicted with a MAD of 0.236 kcal/mol. In the second study, no reactions with selectivity above 80% enantiomeric excess were used as training data. Deep feed-forward neural networks accurately reproduced the experimental selectivity data, successfully predicting the most selective reactions. More notably, the general trends in selectivity, on the basis of average catalyst selectivity, were correctly identified. Despite omitting about half of the experimental free energy range from the training data, we could still make accurate predictions in this region of selectivity space. CONCLUSION The capability to predict selective catalysts has the potential to change the way chemists select and optimize chiral catalysts from an empirically guided to a mathematically guided approach. Chemoinformatics-guided optimization protocol. (A) Generation of a large in silico library of catalyst candidates. (B) Calculation of robust chemical descriptors. (C) Selection of a UTS. (D) Acquisition of experimental selectivity data. (E) Application of machine learning to use moderate- to low-selectivity reactions to predict high-selectivity reactions. R, any group; X, O or S; Y, OH, SH, or NHTf; PC, principal component; ΔΔG, mean selectivity. Catalyst design in asymmetric reaction development has traditionally been driven by empiricism, wherein experimentalists attempt to qualitatively recognize structural patterns to improve selectivity. Machine learning algorithms and chemoinformatics can potentially accelerate this process by recognizing otherwise inscrutable patterns in large datasets. Herein we report a computationally guided workflow for chiral catalyst selection using chemoinformatics at every stage of development. Robust molecular descriptors that are agnostic to the catalyst scaffold allow for selection of a universal training set on the basis of steric and electronic properties. This set can be used to train machine learning methods to make highly accurate predictive models over a broad range of selectivity space. Using support vector machines and deep feed-forward neural networks, we demonstrate accurate predictive modeling in the chiral phosphoric acid–catalyzed thiol addition to N-acylimines.

[1]  V. Cruz,et al.  3D-QSAR as a Tool for Understanding and Improving Single-Site Polymerization Catalysts. A Review , 2014 .

[2]  Richard N. Zare,et al.  Optimizing Chemical Reactions with Deep Reinforcement Learning , 2017, ACS central science.

[3]  Meeta Pradhan,et al.  Computational studies of chiral catalysts: a comparative molecular field analysis of an asymmetric Diels-Alder reaction with catalysts containing bisoxazoline or phosphinooxazoline ligands. , 2003, The Journal of organic chemistry.

[4]  T. Wu,et al.  Asymmetric allylboration of aldehydes and ketones using 3,3'-disubstitutedbinaphthol-modified boronates. , 2004, Organic letters.

[5]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..

[6]  Anton J. Hopfinger,et al.  4D-Fingerprints, Universal QSAR and QSPR Descriptors , 2004, J. Chem. Inf. Model..

[7]  Scott E Denmark,et al.  A systematic investigation of quaternary ammonium ions as asymmetric phase-transfer catalysts. Application of quantitative structure activity/selectivity relationships. , 2011, The Journal of organic chemistry.

[8]  L. Wojtas,et al.  Chiral phosphoric acid-catalyzed addition of thiols to N-acyl imines: access to chiral N,S-acetals. , 2011, Organic letters.

[9]  S. You,et al.  Ring-closing metathesis/isomerization/Pictet-Spengler cascade via ruthenium/chiral phosphoric acid sequential catalysis. , 2012, Organic letters.

[10]  Piotr Dittwald,et al.  Computer-Assisted Synthetic Planning: The End of the Beginning , 2016 .

[11]  Magnus Rueping,et al.  Complete field guide to asymmetric BINOL-phosphate derived Brønsted acid and metal catalysis: history and classification by mode of activation; Brønsted acidity, hydrogen bonding, ion pairing, and metal phosphates. , 2014, Chemical reviews.

[12]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[13]  M. Tius,et al.  Catalytic Enantioselective Nazarov Cyclization , 2017 .

[14]  K. Ishihara,et al.  Chiral Lithium Salts of Phosphoric Acids as Lewis Acid–Base Conjugate Catalysts for the Enantioselective Cyanosilylation of Ketones , 2008 .

[15]  T. Akiyama,et al.  Enantioselective Robinson-type annulation reaction catalyzed by chiral phosphoric acids. , 2009, Angewandte Chemie.

[16]  Martin Hirzel,et al.  Machine learning in Python with no strings attached , 2019, MAPL@PLDI.

[17]  M. Klussmann,et al.  Synthesis of TRIP and Analysis of Phosphate Salt Impurities , 2010 .

[18]  Corwin Hansch,et al.  A Survey of Hammett Substituent Constants and Resonance and Field Parameters , 1991 .

[19]  W. M. Davis,et al.  Molybdenum triamidoamine complexes that contain hexa-tert-butylterphenyl, hexamethylterphenyl, or p-bromohexaisopropylterphenyl substituents. An examination of some catalyst variations for the catalytic reduction of dinitrogen. , 2004, Journal of the American Chemical Society.

[20]  Edmund K. Burke,et al.  Exploring Phase‐Transfer Catalysis with Molecular Dynamics and 3D/4D Quantitative Structure—Selectivity Relationships. , 2005 .

[21]  L. Gong,et al.  Enantioselective direct aza hetero-Diels-Alder reaction catalyzed by chiral Brønsted acids. , 2006, Organic letters.

[22]  Peter Willett,et al.  Chemoinformatics: a history , 2011 .

[23]  Yungui Peng,et al.  Asymmetric Mannich Reaction of Isatin-Based Ketimines with α-Diazomethylphosphonates Catalyzed by Chiral Silver Phosphate. , 2016, Organic letters.

[24]  Shin-ichi Kuno,et al.  Scyllo-inositol as a convenient protecting group for aryl boronic acids in Suzuki–Miyaura cross-coupling reactions , 2014 .

[25]  Regina Barzilay,et al.  Prediction of Organic Reaction Outcomes Using Machine Learning , 2017, ACS central science.

[26]  J. Rossier,et al.  Straightforward synthesis of the near-infrared fluorescent voltage-sensitive dye RH1691 and analogues thereof. , 2009, Organic letters.

[27]  C. Senanayake,et al.  Ligand-Accelerated Stereoretentive Suzuki-Miyaura Coupling of Unprotected 3,3'-Dibromo-BINOL. , 2016, The Journal of organic chemistry.

[28]  Steven E. Wheeler,et al.  Through-Space Effects of Substituents Dominate Molecular Electrostatic Potentials of Substituted Arenes. , 2009, Journal of chemical theory and computation.

[29]  M. Wills,et al.  Ruthenium(II) complexes of monodonor ligands: efficient reagents for asymmetric ketone hydrogenation. , 2005, The Journal of organic chemistry.

[30]  K. Roy,et al.  Introduction to 3D-QSAR , 2015 .

[31]  M. Beller,et al.  A convenient protocol for the synthesis of axially chiral Brnsted acids , 2008 .

[32]  Computational screening of combinatorial catalyst libraries. , 2004, Chemical communications.

[33]  S. Segawa,et al.  End of the beginning , 1990, Nature.

[34]  Anat Milo,et al.  The Development of Multidimensional Analysis Tools for Asymmetric Catalysis and Beyond. , 2016, Accounts of chemical research.

[35]  Matthew S. Sigman,et al.  Quantifying Structural Effects of Amino Acid Ligands in Pd(II)-Catalyzed Enantioselective C-H Functionalization Reactions , 2017 .

[36]  R. Bellman Dynamic programming. , 1957, Science.

[37]  John B. O. Mitchell Machine learning methods in chemoinformatics , 2014, Wiley interdisciplinary reviews. Computational molecular science.

[38]  A. Alexakis,et al.  Enantioselective organocatalytic fluorination-induced Wagner-Meerwein rearrangement. , 2013, Angewandte Chemie.

[39]  Matthew S. Sigman,et al.  Relationships Guides Asymmetric Propargylation Three-Dimensional Correlation of Steric and Electronic Free Energy , 2014 .

[40]  Yaming Zhou,et al.  One-dimensional (1D) helical and 2D homochiral metal–organic frameworks built from a new chiral octahydrobinaphthalene-derived dicarboxylic acid , 2008 .

[41]  J. Goodman,et al.  Theoretical study of the mechanism of hantzsch ester hydrogenation of imines catalyzed by chiral BINOL-phosphoric acids. , 2008, Journal of the American Chemical Society.

[42]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[43]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .

[44]  Pinaki S. Bhadury,et al.  Regioselective formylation of 1,3-disubstituted benzenes through in situ lithiation , 2013 .

[45]  A. Gambin,et al.  Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient? , 2017, Scientific Reports.

[46]  C. Schneider,et al.  Brønsted acid-catalyzed, highly enantioselective addition of enamides to in situ-generated ortho-quinone methides: a domino approach to complex acetamidotetrahydroxanthenes. , 2015, Chemistry.

[47]  Magnus Rueping,et al.  Addition and Correction to Complete Field Guide to Asymmetric BINOL-Phosphate Derived Brønsted Acid and Metal Catalysis: History and Classification by Mode of Activation; Brønsted Acidity, Hydrogen Bonding, Ion Pairing, and Metal Phosphates. , 2017, Chemical reviews.

[48]  Z. Bian,et al.  Stereoselective and hierarchical self-assembly from nanotubular homochiral helical coordination polymers to supramolecular gels. , 2010, Chemical communications.

[49]  S. You,et al.  Enantioselective Synthesis of Unsymmetrical Triarylmethanes by Chiral Brønsted Acids , 2010 .

[50]  B. Nachtsheim,et al.  Synthesis and structural aspects of N-triflylphosphoramides and their calcium salts--highly acidic and effective Brønsted acids. , 2010, Chemistry.

[51]  S. Denmark,et al.  A systematic investigation of quaternary ammonium ions as asymmetric phase-transfer catalysts. Synthesis of catalyst libraries and evaluation of catalyst activity. , 2011, The Journal of organic chemistry.

[52]  Yan‐Bo Yu,et al.  Pd-catalyzed direct arylation of nitro(pentafluorosulfanyl)benzenes with aryl bromides. , 2013, Organic letters.

[53]  D. Lentz,et al.  Synthesis of Gold Complexes Bearing Sterically Highly Encumbered, Chiral Carbene Ligands , 2011 .

[54]  Steven J. Malcolmson,et al.  Design and stereoselective preparation of a new class of chiral olefin metathesis catalysts and application to enantioselective synthesis of quebrachamine: catalyst development inspired by natural product synthesis. , 2009, Journal of the American Chemical Society.

[55]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[56]  Qixiang Guo,et al.  Chiral Calcium Phosphate Catalyzed Asymmetric Alkenylation Reaction of Arylglyoxals with 3-Vinylindoles. , 2017, Organic letters.

[57]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[58]  S. Buchwald,et al.  Design and Preparation of New Palladium Precatalysts for C-C and C-N Cross-Coupling Reactions. , 2012, Chemical science.

[59]  Shawn T. Brown,et al.  Advances in methods and algorithms in a modern quantum chemistry program package. , 2006, Physical chemistry chemical physics : PCCP.

[60]  Alonso J. Arguelles,et al.  Direct Interconversion of BINOL and H8-BINOL-Based Chiral Brønsted Acids Using Single-Step Red/Ox Manipulations. , 2015, Organic letters.

[61]  A. Alex,et al.  Theoretical prediction of the enantiomeric excess in asymmetric catalysis. An alignment-independent molecular interaction field based approach. , 2005, The Journal of organic chemistry.

[62]  Thomas Engel,et al.  Basic Overview of Chemoinformatics. , 2007 .

[63]  T. Akiyama,et al.  Chiral Brønsted acid catalyzed enantioselective Mannich-type reaction. , 2007, Journal of the American Chemical Society.

[64]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[65]  Derek T. Ahneman,et al.  Deoxyfluorination with Sulfonyl Fluorides: Navigating Reaction Space with Machine Learning. , 2018, Journal of the American Chemical Society.

[66]  S. Jockusch,et al.  Two-photon induced uncaging of a reactive intermediate. Multiphoton in situ detection of a potentially valuable label for biological applications. , 2005, The Journal of organic chemistry.

[67]  Matthew S Sigman,et al.  Multidimensional steric parameters in the analysis of asymmetric catalytic reactions. , 2012, Nature chemistry.

[68]  Ji-Hoon Lee,et al.  Synthesis and electroluminescence properties of novel deep blue emitting 6,12-dihydro-diindeno[1,2-b;1',2'-e]pyrazine derivatives. , 2008, Chemical communications.

[69]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[70]  Paul Raccuglia,et al.  Machine-learning-assisted materials discovery using failed experiments , 2016, Nature.

[71]  Don R. Hush,et al.  Learning from dependent observations , 2007, J. Multivar. Anal..

[72]  Matthew S Sigman,et al.  Predicting and optimizing asymmetric catalyst performance using the principles of experimental design and steric parameters , 2011, Proceedings of the National Academy of Sciences.

[73]  Alán Aspuru-Guzik,et al.  Neural Networks for the Prediction of Organic Chemistry Reactions , 2016, ACS central science.

[74]  B. Pan,et al.  Synthesis of Novel Chiral Phosphoric Acid‐Bearing Two Acidic Phenolic Hydroxyl Groups and its Catalytic Evaluation for Enantioselective Friedel‐Crafts Alkylation of Indoles and Enones , 2015 .

[75]  S. L. Dixon,et al.  Quantum mechanical models correlating structure with selectivity: predicting the enantioselectivity of beta-amino alcohol catalysts in aldehyde alkylation. , 2003, Journal of the American Chemical Society.

[76]  Galo Canizares A History , 2018, Thresholds.

[77]  Daniel G. Bobrow,et al.  DENDRAL and Meta-DENDRAL: roots of knowledge systems and expert system applications , 1994 .

[78]  Frederick R. Manby,et al.  Machine-learning approach for one- and two-body corrections to density functional theory: Applications to molecular and condensed water , 2013 .

[79]  L. Guénée,et al.  Synthesis, Structural Analysis, and Catalytic Properties of Tetrakis(binaphthyl or octahydrobinaphthyl phosphate) Dirhodium(II,II) Complexes , 2013 .

[80]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[81]  D. Procter,et al.  SmI2-mediated radical cyclizations directed by a C-Si bond. , 2010, Organic letters.

[82]  Herman van Vlijmen,et al.  Recent advances in chemoinformatics. , 2007, Journal of chemical information and modeling.

[83]  H. Alper,et al.  Diastereoselective synthesis of hexahydropyrrolo[2,1-b]oxazoles by a rhodium-catalyzed hydroformylation/silica-promoted deformylation sequence. , 2009, Angewandte Chemie.

[84]  S. L. Wiskur,et al.  Diastereoselective and enantioselective silylation of 2-arylcyclohexanols. , 2015, Organic letters.

[85]  Derek T. Ahneman,et al.  Predicting reaction performance in C–N cross-coupling using machine learning , 2018, Science.

[86]  V. Ferrario,et al.  A Three-Dimensional Quanititative Structure-Activity Relationship (3D-QSAR) Model for Predicting the Enantioselectivity of Candida antarctica Lipase B , 2009 .

[87]  K. Oh,et al.  Substituted Pyrrololactams via Ring Expansion of Spiro-2H-pyrroles from Intermolecular Alkyne-Isocyanide Click Reactions. , 2017, Organic letters.