Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach.

Chemically accurate and comprehensive studies of the virtual space of all possible molecules are severely limited by the computational cost of quantum chemistry. We introduce a composite strategy that adds machine learning corrections to computationally inexpensive approximate legacy quantum methods. After training, highly accurate predictions of enthalpies, free energies, entropies, and electron correlation energies are possible, for significantly larger molecular sets than used for training. For thermochemical properties of up to 16k isomers of C7H10O2 we present numerical evidence that chemical accuracy can be reached. We also predict electron correlation energy in post Hartree-Fock methods, at the computational cost of Hartree-Fock, and we establish a qualitative relationship between molecular entropy and electron correlation. The transferability of our approach is demonstrated, using semiempirical quantum chemistry and machine learning models trained on 1 and 10% of 134k organic molecules, to reproduce enthalpies of all remaining molecules at density functional theory level of accuracy.

[1]  S. M. Sarathy,et al.  A comprehensive chemical kinetic combustion model for the four butanol isomers , 2012 .

[2]  Ursula Rothlisberger,et al.  Variational particle number approach for rational compound design. , 2005, Physical review letters.

[3]  L. Curtiss,et al.  Gaussian‐1 theory: A general procedure for prediction of molecular energies , 1989 .

[4]  Peter S Kutchukian,et al.  De novo design: balancing novelty and confined chemical space , 2010, Expert opinion on drug discovery.

[5]  Klaus-Robert Müller,et al.  Interaction Potentials in Molecules and Non-Local Information in Chemical Space , 2014 .

[6]  Krishnan Raghavachari,et al.  Gaussian-2 theory for molecular energies of first- and second-row compounds , 1991 .

[7]  Roman M. Balabin,et al.  Neural network approach to quantum-chemistry data: accurate prediction of density functional theory energies. , 2009, The Journal of chemical physics.

[8]  M. Frisch,et al.  Ab Initio Calculation of Vibrational Absorption and Circular Dichroism Spectra Using Density Functional Force Fields , 1994 .

[9]  Weitao Yang,et al.  Challenges for density functional theory. , 2012, Chemical reviews.

[10]  M. Rupp,et al.  Machine learning of molecular electronic properties in chemical compound space , 2013, 1305.7074.

[11]  Gisbert Schneider,et al.  Virtual screening: an endless staircase? , 2010, Nature Reviews Drug Discovery.

[12]  James J. P. Stewart,et al.  Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parameters , 2012, Journal of Molecular Modeling.

[13]  Lihong Hu,et al.  Combined first-principles calculation and neural-network correction approach for heat of formation , 2003 .

[14]  W. Green,et al.  Combustion and pyrolysis of iso-butanol: Experimental and chemical kinetic modeling study , 2013 .

[15]  Walter Thiel,et al.  Orthogonalization corrections for semiempirical methods , 2000 .

[16]  Weitao Yang,et al.  Designing molecules by optimizing potentials. , 2006, Journal of the American Chemical Society.

[17]  Wolfram Koch,et al.  A Chemist's Guide to Density Functional Theory , 2000 .

[18]  B. Stoddard,et al.  Combinatorial thinking in chemistry and biology. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Burke,et al.  Generalized Gradient Approximation Made Simple. , 1996, Physical review letters.

[20]  P. Wipf,et al.  Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. , 2013, Journal of the American Chemical Society.

[21]  Klaus-Robert Müller,et al.  Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. , 2013, Journal of chemical theory and computation.

[22]  Christoph Kuhn,et al.  Inverse Strategies for Molecular Design , 1996 .

[23]  W. L. Jorgensen The Many Roles of Computation in Drug Discovery , 2004, Science.

[24]  A. Szabó,et al.  Modern quantum chemistry : introduction to advanced electronic structure theory , 1982 .

[25]  Lusann Yang,et al.  Data-mined similarity function between material compositions , 2013 .

[26]  P. Kirkpatrick,et al.  Chemical space , 2004, Nature.

[27]  L. Curtiss,et al.  Gaussian-4 theory. , 2007, The Journal of chemical physics.

[28]  R. McGibbon,et al.  Discovering chemistry with an ab initio nanoreactor , 2014, Nature chemistry.

[29]  Alex Zunger,et al.  The inverse band-structure problem of finding an atomic configuration with given electronic properties , 1999, Nature.

[30]  Gábor Csányi,et al.  First-principles energetics of water clusters and ice: a many-body analysis. , 2013, The Journal of chemical physics.

[31]  Walter Kohn NEARSIGHTEDNESS OF ELECTRONIC MATTER , 2008 .

[32]  Sándor Suhai,et al.  Self-consistent-charge density-functional tight-binding method for simulations of complex materials properties , 1998 .

[33]  Wing Tsang,et al.  Chemical Kinetic Data Base for Combustion Chemistry. Part I. Methane and Related Compounds , 1986 .

[34]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[35]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[36]  L. Curtiss,et al.  Gaussian-4 theory using reduced order perturbation theory. , 2007, The Journal of chemical physics.

[37]  R. Friesner Ab initio quantum chemistry: methodology and applications. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[38]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[39]  J. Nørskov,et al.  Towards the computational design of solid catalysts. , 2009, Nature chemistry.

[40]  J. Gasteiger,et al.  FROM ATOMS AND BONDS TO THREE-DIMENSIONAL ATOMIC COORDINATES : AUTOMATIC MODEL BUILDERS , 1993 .

[41]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[42]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[43]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[44]  M. G. Finn,et al.  Click Chemistry: Diverse Chemical Function from a Few Good Reactions , 2001 .

[45]  O. A. V. Lilienfeld,et al.  First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties , 2012, 1209.5033.

[46]  Alán Aspuru-Guzik,et al.  The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid , 2011 .

[47]  W. W Duley Interstellar chemistry , 1984 .

[48]  R. Armiento,et al.  Functional designed to include surface effects in self-consistent density functional theory , 2005 .