Representations and Strategies for Transferable Machine Learning Models in Chemical Discovery

Strategies for machine-learning(ML)-accelerated discovery that are general across materials composition spaces are essential, but demonstrations of ML have been primarily limited to narrow composition variations. By addressing the scarcity of data in promising regions of chemical space for challenging targets like open-shell transition-metal complexes, general representations and transferable ML models that leverage known relationships in existing data will accelerate discovery. Over a large set (ca. 1000) of isovalent transition-metal complexes, we quantify evident relationships for different properties (i.e., spin-splitting and ligand dissociation) between rows of the periodic table (i.e., 3d/4d metals and 2p/3p ligands). We demonstrate an extension to graph-based revised autocorrelation (RAC) representation (i.e., eRAC) that incorporates the effective nuclear charge alongside the nuclear charge heuristic that otherwise overestimates dissimilarity of isovalent complexes. To address the common challenge of discovery in a new space where data is limited, we introduce a transfer learning approach in which we seed models trained on a large amount of data from one row of the periodic table with a small number of data points from the additional row. We demonstrate the synergistic value of the eRACs alongside this transfer learning strategy to consistently improve model performance. Analysis of these models highlights how the approach succeeds by reordering the distances between complexes to be more consistent with the periodic table, a property we expect to be broadly useful for other materials domains.

[1]  Tom K. Woo,et al.  Atomic Property Weighted Radial Distribution Functions Descriptors of Metal–Organic Frameworks for the Prediction of Gas Uptake Capacity , 2013 .

[2]  Chi Chen,et al.  Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals , 2018, Chemistry of Materials.

[3]  Zahed Allahyari,et al.  Nonempirical Definition of the Mendeleev Numbers: Organizing the Chemical Space , 2020, 2007.00091.

[4]  Heather J Kulik,et al.  Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure-Property Relationships. , 2017, The journal of physical chemistry. A.

[5]  H. Kulik,et al.  A Quantitative Uncertainty Metric Controls Error in Neural Network-Driven Chemical Discovery , 2019 .

[6]  Fang Liu,et al.  Learning from Failure: Predicting Electronic Structure Calculation Outcomes with Machine Learning Models. , 2019, Journal of chemical theory and computation.

[7]  Steven M. Maley,et al.  Quantum-mechanical transition-state model combined with machine learning provides catalyst design features for selective Cr olefin oligomerization† , 2020, Chemical science.

[8]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[9]  Heather J Kulik,et al.  Predicting electronic structure properties of transition metal complexes with neural networks† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc01247k , 2017, Chemical science.

[11]  Rebecca T. Ruck,et al.  Cobalt-catalyzed asymmetric hydrogenation of enamides enabled by single-electron reduction , 2018, Science.

[12]  Matthew D. Wodrich,et al.  Data-Driven Advancement of Homogeneous Nickel Catalyst Activity for Aryl Ether Cleavage , 2020 .

[13]  Heather J Kulik,et al.  Accelerating Chemical Discovery with Machine Learning: Simulated Evolution of Spin Crossover Complexes with an Artificial Neural Network. , 2018, The journal of physical chemistry letters.

[14]  Derek J Durand,et al.  Computational Ligand Descriptors for Catalyst Design. , 2019, Chemical reviews.

[15]  A. Becke Density-functional thermochemistry. III. The role of exact exchange , 1993 .

[16]  Thomas F. Miller,et al.  Transferability in Machine Learning for Electronic Structure via the Molecular Orbital Basis. , 2018, Journal of chemical theory and computation.

[17]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[18]  M. Frisch,et al.  Ab Initio Calculation of Vibrational Absorption and Circular Dichroism Spectra Using Density Functional Force Fields , 1994 .

[19]  K. Gkagkas,et al.  A Universal Machine Learning Algorithm for Large Scale Screening of Materials. , 2020, Journal of the American Chemical Society.

[20]  A. Benayad,et al.  Bio-inspired noble metal-free nanomaterials approaching platinum performances for H2 evolution and uptake , 2016 .

[21]  H. Kulik,et al.  Navigating Transition-Metal Chemical Space: Artificial Intelligence for First-Principles Design. , 2021, Accounts of chemical research.

[22]  V. R. Saunders,et al.  A “Level–Shifting” method for converging closed shell Hartree–Fock wave functions , 1973 .

[23]  R. Chiong,et al.  Group and Period-Based Representations for Improved Machine Learning Prediction of Heterogeneous Alloy Catalysts. , 2021, The journal of physical chemistry letters.

[24]  Jon Paul Janet,et al.  Enumeration of de novo inorganic complexes for chemical discovery and machine learning , 2019 .

[25]  Markus Reiher,et al.  Heuristics-Guided Exploration of Reaction Mechanisms. , 2015, Journal of chemical theory and computation.

[26]  Victor S Batista,et al.  Inverse design and synthesis of acac-coumarin anchors for robust TiO2 sensitization. , 2011, Journal of the American Chemical Society.

[27]  P. Wipf,et al.  Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. , 2013, Journal of the American Chemical Society.

[28]  Alán Aspuru-Guzik,et al.  Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space , 2017, ICML.

[29]  Benjamin Rudshteyn,et al.  Inverse Design of a Catalyst for Aqueous CO/CO2 Conversion Informed by the NiII-Iminothiolate Complex. , 2018, Inorganic chemistry.

[30]  Peter G. Boyd,et al.  Understanding the diversity of the metal-organic framework ecosystem , 2020, Nature Communications.

[31]  Parr,et al.  Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. , 1988, Physical review. B, Condensed matter.

[32]  Alán Aspuru-Guzik,et al.  Inverse design of nanoporous crystalline reticular materials with deep generative models , 2021, Nat. Mach. Intell..

[33]  Heather J. Kulik,et al.  molSimplify: A toolkit for automating discovery in inorganic chemistry , 2016, J. Comput. Chem..

[34]  Ivan S. Ufimtsev,et al.  TeraChem: Accelerating electronic structure and ab initio molecular dynamics with graphical processing units. , 2020, The Journal of chemical physics.

[35]  J. Savéant,et al.  Through-Space Charge Interaction Substituent Effects in Molecular Catalysis Leading to the Design of the Most Efficient Catalyst of CO2-to-CO Electrochemical Conversion. , 2016, Journal of the American Chemical Society.

[36]  Adam H. Steeves,et al.  Leveraging Cheminformatics Strategies for Inorganic Discovery: Application to Redox Potential Design , 2017 .

[37]  Ivan S. Ufimtsev,et al.  TeraChem: A graphical processing unit‐accelerated electronic structure package for large‐scale ab initio molecular dynamics , 2020, WIREs Computational Molecular Science.

[38]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[39]  Alán Aspuru-Guzik,et al.  Identification Schemes for Metal–Organic Frameworks To Enable Rapid Search and Cheminformatics Analysis , 2019, Crystal Growth & Design.

[40]  F. Liu,et al.  Designing in the Face of Uncertainty: Exploiting Electronic Structure and Machine Learning Models for Discovery in Inorganic Chemistry. , 2019, Inorganic chemistry.

[41]  F. Neese,et al.  A linear cobalt(II) complex with maximal orbital angular momentum from a non-Aufbau ground state , 2018, Science.

[42]  J. Behler,et al.  Predicting oxidation and spin states by high-dimensional neural networks: Applications to lithium manganese oxide spinels. , 2020, The Journal of chemical physics.

[43]  Noam Bernstein,et al.  Machine learning unifies the modeling of materials and molecules , 2017, Science Advances.

[44]  Thomas F. Miller,et al.  Improved accuracy and transferability of molecular-orbital-based machine learning: Organics, transition-metal complexes, non-covalent interactions, and transition states. , 2020, The Journal of chemical physics.

[45]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[46]  E. Hensen,et al.  Catalytic (de)hydrogenation promoted by non-precious metals - Co, Fe and Mn: recent advances in an emerging field. , 2018, Chemical Society reviews.

[47]  Zachary W. Ulissi,et al.  Active learning across intermetallics to guide discovery of electrocatalysts for CO2 reduction and H2 evolution , 2018, Nature Catalysis.

[48]  Markus Reiher,et al.  Inverse quantum chemistry: Concepts and strategies for rational compound design , 2014, 1401.1512.

[49]  A. J. P. Cardenas,et al.  Reversing the Tradeoff between Rate and Overpotential in Molecular Electrocatalysts for H2 Production , 2018 .

[50]  J. Mayer,et al.  Combining scaling relationships overcomes rate versus overpotential trade-offs in O2 molecular electrocatalysis , 2020, Science Advances.

[51]  P. Chirik,et al.  Getting Down to Earth: The Renaissance of Catalysis with Abundant Metals. , 2015, Accounts of chemical research.

[52]  G. Allmaier,et al.  Divergent Coupling of Alcohols and Amines Catalyzed by Isoelectronic Hydride Mn(I) and Fe(II) PNP Pincer Complexes. , 2016, Chemistry.

[53]  Chris Morley,et al.  Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit , 2008, Chemistry Central journal.

[54]  Antoni Llobet,et al.  A molecular ruthenium catalyst with water-oxidation activity comparable to that of photosystem II. , 2012, Nature chemistry.

[55]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[56]  E. Brothers,et al.  The Distinctive Electronic Structures of Rhenium Tris(thiolate) Complexes, an Unexpected Contrast to the Valence Isoelectronic Ruthenium Tris(thiolate) Complexes. , 2017, Inorganic chemistry.

[57]  O Anatole von Lilienfeld,et al.  Quantum Machine Learning in Chemical Compound Space. , 2018, Angewandte Chemie.

[58]  V. Batista,et al.  Search for Catalysts by Inverse Design: Artificial Intelligence, Mountain Climbers, and Alchemists. , 2019, Chemical reviews.

[59]  Christopher J. Bartel,et al.  Machine learning for heterogeneous catalyst design and discovery , 2018 .

[60]  Fang Liu,et al.  Semi-Supervised Machine Learning Enables the Robust Detection of Multireference Character at Low Cost. , 2020, The journal of physical chemistry letters.

[61]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[62]  J. Pople,et al.  Self‐Consistent Molecular‐Orbital Methods. IX. An Extended Gaussian‐Type Basis for Molecular‐Orbital Studies of Organic Molecules , 1971 .

[63]  Rachel B. Getman,et al.  Machine Learning Accelerates the Discovery of Design Rules and Exceptions in Stable Metal–Oxo Intermediate Formation , 2019, ACS Catalysis.

[64]  Lee-Ping Wang,et al.  Geometry optimization made simple with translation and rotation coordinates. , 2016, The Journal of chemical physics.

[65]  W. R. Wadt,et al.  Ab initio effective core potentials for molecular calculations , 1984 .

[66]  Anders S. Christensen,et al.  Alchemical and structural distribution based representation for universal quantum machine learning. , 2017, The Journal of chemical physics.

[67]  Fang Liu,et al.  Rapid Detection of Strong Correlation with Machine Learning for Transition-Metal Complex High-Throughput Screening. , 2020, The journal of physical chemistry letters.

[68]  Ievgeniia Oshurko Quantum Machine Learning , 2020, Quantum Computing.

[69]  H. Kulik,et al.  Large-scale comparison of 3d and 4d transition metal complexes illuminates the reduced effect of exchange on second-row spin-state energetics. , 2020, Physical chemistry chemical physics : PCCP.

[70]  P. Broto,et al.  Molecular structures: perception, autocorrelation descriptor and sar studies: system of atomic contributions for the calculation of the n-octanol/water partition coefficients , 1984 .

[71]  Joseph M. Zadrozny,et al.  Slow magnetization dynamics in a series of two-coordinate iron(II) complexes , 2013 .

[72]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[73]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[74]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[75]  Adam C Mater,et al.  Deep Learning in Chemistry , 2019, J. Chem. Inf. Model..

[76]  Matthew D. Wodrich,et al.  The Genesis of Molecular Volcano Plots. , 2021, Accounts of chemical research.

[77]  Vidar R. Jensen,et al.  Automated in Silico Design of Homogeneous Catalysts , 2020 .

[78]  M. Reiher,et al.  Isoelectronic Arduengo‐Type Carbene Analogues with the Group IIIa Elements Boron, Aluminum, Gallium, and Indium , 1998 .

[79]  Chenru Duan,et al.  Strategies and Software for Machine Learning Accelerated Discovery in Transition Metal Chemistry , 2018, Industrial & Engineering Chemistry Research.

[80]  Pavlo O. Dral,et al.  Quantum Chemistry in the Age of Machine Learning. , 2020, The journal of physical chemistry letters.

[81]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[82]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[83]  Chenru Duan,et al.  Accurate Multiobjective Design in a Space of Millions of Transition Metal Complexes with Neural-Network-Driven Efficient Global Optimization , 2020, ACS central science.

[84]  Koichiro Mikami,et al.  Interactive-quantum-chemical-descriptors enabling accurate prediction of an activation energy through machine learning , 2020 .

[85]  K. Vogiatzis,et al.  Computational Approach to Molecular Catalysis by 3d Transition Metals: Challenges and Opportunities , 2018, Chemical reviews.

[86]  T. Tuttle,et al.  Catalyst design in C–H activation: a case study in the use of binding free energies to rationalise intramolecular directing group selectivity in iridium catalysis† , 2021, Chemical science.

[87]  P. D. Tran,et al.  From Hydrogenases to Noble Metal–Free Catalytic Nanomaterials for H2 Production and Uptake , 2009, Science.

[88]  Alok Choudhary,et al.  Combinatorial screening for new materials in unconstrained composition space with machine learning , 2014 .

[89]  Marco Buongiorno Nardelli,et al.  The high-throughput highway to computational materials design. , 2013, Nature materials.

[90]  Christopher H. Hendon,et al.  Using nature’s blueprint to expand catalysis with Earth-abundant metals , 2020, Science.

[91]  F. Kapteijn,et al.  Activity descriptors derived from comparison of Mo and Fe as active metal for methane conversion to aromatics. , 2019, Journal of the American Chemical Society.

[92]  Michael G. Taylor,et al.  Seeing Is Believing: Experimental Spin States from Machine Learning Model Structure Predictions , 2020, The journal of physical chemistry. A.

[93]  Matthias Poloczek,et al.  Efficient search of compositional space for hybrid organic–inorganic perovskites via Bayesian optimization , 2018, npj Computational Materials.

[94]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[95]  Gadi Rothenberg,et al.  Topological Mapping of Bidentate Ligands: A Fast Approach for Screening Homogeneous Catalysts , 2005 .