A Universal Density Matrix Functional from Molecular Orbital-Based Machine Learning: Transferability across Organic Molecules

We address the degree to which machine learning (ML) can be used to accurately and transferably predict post-Hartree-Fock correlation energies. Refined strategies for feature design and selection are presented, and the molecular-orbital-based machine learning (MOB-ML) method is applied to several test systems. Strikingly, for the second-order Møller-Plessett perturbation theory, coupled cluster with singles and doubles (CCSD), and CCSD with perturbative triples levels of theory, it is shown that the thermally accessible (350 K) potential energy surface for a single water molecule can be described to within 1 mhartree using a model that is trained from only a single reference calculation at a randomized geometry. To explore the breadth of chemical diversity that can be described, MOB-ML is also applied to a new dataset of thermalized (350 K) geometries of 7211 organic models with up to seven heavy atoms. In comparison with the previously reported Δ-ML method, MOB-ML is shown to reach chemical accuracy with threefold fewer training geometries. Finally, a transferability test in which models trained for seven-heavy-atom systems are used to predict energies for thirteen-heavy-atom systems reveals that MOB-ML reaches chemical accuracy with 36-fold fewer training calculations than Δ-ML (140 vs 5000 training calculations).

[1]  Olexandr Isayev,et al.  Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[2]  Antonio Lavecchia,et al.  Machine-learning approaches in drug discovery: methods and applications. , 2015, Drug discovery today.

[3]  Frederick R. Manby,et al.  Fast Hartree–Fock theory using local density fitting approximations , 2004 .

[4]  Zachary W. Ulissi,et al.  To address surface reaction network complexity using scaling relations machine learning and DFT calculations , 2017, Nature Communications.

[5]  Volker L. Deringer,et al.  Gaussian approximation potential modeling of lithium intercalation in carbon nanostructures. , 2017, The Journal of chemical physics.

[6]  A. Becke Density-functional thermochemistry. III. The role of exact exchange , 1993 .

[7]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[8]  M. Frisch,et al.  Ab Initio Calculation of Vibrational Absorption and Circular Dichroism Spectra Using Density Functional Force Fields , 1994 .

[9]  E Weinan,et al.  Deep Potential Molecular Dynamics: a scalable model with the accuracy of quantum mechanics , 2017, Physical review letters.

[10]  Bing Huang,et al.  Boosting Quantum Machine Learning Models with a Multilevel Combination Technique: Pople Diagrams Revisited. , 2018, Journal of chemical theory and computation.

[11]  S. H. Vosko,et al.  Accurate spin-dependent electron liquid correlation energies for local spin density calculations: a critical analysis , 1980 .

[12]  John D. Watts,et al.  Non-iterative fifth-order triple and quadruple excitation energy corrections in correlated methods , 1990 .

[13]  S. F. Boys Construction of Some Molecular Orbitals to Be Approximately Invariant for Changes from One Molecule to Another , 1960 .

[14]  Gisbert Schneider,et al.  Deep Learning in Drug Discovery , 2016, Molecular informatics.

[15]  Parr,et al.  Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. , 1988, Physical review. B, Condensed matter.

[16]  Gerald Knizia,et al.  Intrinsic Atomic Orbitals: An Unbiased Bridge between Quantum Theory and Chemical Concepts. , 2013, Journal of chemical theory and computation.

[17]  Martin Schütz,et al.  Low-order scaling local electron correlation methods. III. Linear scaling local perturbative triples correction (T) , 2000 .

[18]  R. Nesbet Brueckner's Theory and the Method of Superposition of Configurations , 1958 .

[19]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[20]  Klaus Obermayer,et al.  Gaussian process regression: active data selection and test point rejection , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[21]  J. Behler Perspective: Machine learning potentials for atomistic simulations. , 2016, The Journal of chemical physics.

[22]  Anders S. Christensen,et al.  Operators in quantum machine learning: Response properties in chemical space. , 2018, The Journal of chemical physics.

[23]  Justin S. Smith,et al.  Hierarchical modeling of molecular energies using a deep neural network. , 2017, The Journal of chemical physics.

[24]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[25]  B. Kramer,et al.  Localization: theory and experiment , 1993 .

[26]  P. C. Hariharan,et al.  The influence of polarization functions on molecular orbital hydrogenation energies , 1973 .

[27]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[28]  Alán Aspuru-Guzik,et al.  Advances in molecular quantum chemistry contained in the Q-Chem 4 program package , 2014, Molecular Physics.

[29]  Francesco Paesani,et al.  Getting the Right Answers for the Right Reasons: Toward Predictive Molecular Simulations of Water with Many-Body Potential Energy Functions. , 2016, Accounts of chemical research.

[30]  Geoffrey J. Gordon,et al.  A Density Functional Tight Binding Layer for Deep Learning of Chemical Hamiltonians. , 2018, Journal of chemical theory and computation.

[31]  A. Szabo,et al.  Modern quantum chemistry , 1982 .

[32]  K. Pearson Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia , 1896 .

[33]  J. Cizek On the Correlation Problem in Atomic and Molecular Systems. Calculation of Wavefunction Components in Ursell-Type Expansion Using Quantum-Field Theoretical Methods , 1966 .

[34]  Michele Ceriotti,et al.  Recognizing molecular patterns by machine learning: an agnostic structural definition of the hydrogen bond. , 2014, The Journal of chemical physics.

[35]  Lawrence D. Jackel,et al.  Learning Curves: Asymptotic Values and Rate of Convergence , 1993, NIPS.

[36]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[37]  Jörg Behler,et al.  Comparison of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials in representing water interactions through many-body expansions. , 2018, The Journal of chemical physics.

[38]  Alberto Fabrizio,et al.  Transferable Machine-Learning Model of the Electron Density , 2018, ACS central science.

[39]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[40]  Marwin H. S. Segler,et al.  Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. , 2017, Chemistry.

[41]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[42]  Paul Raccuglia,et al.  Machine-learning-assisted materials discovery using failed experiments , 2016, Nature.

[43]  Li Li,et al.  Bypassing the Kohn-Sham equations with machine learning , 2016, Nature Communications.

[44]  David Li,et al.  Deep Learning in Drug Discovery and Medicine; Scratching the Surface , 2018, Molecules.

[45]  Alán Aspuru-Guzik,et al.  Neural Networks for the Prediction of Organic Chemistry Reactions , 2016, ACS central science.

[46]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[47]  Matthias Rupp,et al.  Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. , 2015, Journal of chemical theory and computation.

[48]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[49]  David W Toth,et al.  The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics , 2017, Chemical science.

[50]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[51]  Hans-Joachim Werner,et al.  Local treatment of electron correlation in coupled cluster theory , 1996 .

[52]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[53]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[54]  Anders S. Christensen,et al.  Alchemical and structural distribution based representation for universal quantum machine learning. , 2017, The Journal of chemical physics.

[55]  M. Parrinello,et al.  Accurate sampling using Langevin dynamics. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[56]  U. Kaldor Localized Orbitals for NH3, C2H4, and C2H2 , 1967 .

[57]  Stefanie Jegelka,et al.  Virtual screening of inorganic materials synthesis parameters with deep learning , 2017, npj Computational Materials.

[58]  M. Rupp,et al.  Machine learning of molecular electronic properties in chemical compound space , 2013, 1305.7074.

[59]  T. H. Dunning Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen , 1989 .

[60]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[61]  Andrew G. Taube,et al.  Improving the accuracy of Møller-Plesset perturbation theory with neural networks. , 2017, The Journal of chemical physics.

[62]  M. Plesset,et al.  Note on an Approximation Treatment for Many-Electron Systems , 1934 .

[63]  Thomas F. Miller,et al.  Transferability in Machine Learning for Electronic Structure via the Molecular Orbital Basis. , 2018, Journal of chemical theory and computation.

[64]  Klaus-Robert Müller,et al.  Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. , 2013, Journal of chemical theory and computation.

[65]  Christopher Wolverton,et al.  Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments , 2018, Science Advances.

[66]  Lorenz C. Blum,et al.  970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. , 2009, Journal of the American Chemical Society.

[67]  Ilias Bilionis,et al.  Gaussian processes with built-in dimensionality reduction: Applications in high-dimensional uncertainty propagation , 2016, 1602.04550.

[68]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[69]  Yang Yang,et al.  Accurate molecular polarizabilities with coupled cluster theory and machine learning , 2018, Proceedings of the National Academy of Sciences.