Size‐Extensive Molecular Machine Learning with Global Representations

Machine learning (ML) models are increasingly used in combination with electronic structure calculations to predict molecular properties at a much lower computational cost in highthroughput settings. Such ML models require representations that encode the molecular structure, which are generally designed to respect the symmetries and invariances of the target property. However, size-extensivity is usually not guaranteed for so-called global representations. In this contribution, we show how extensivity can be built into global ML models using, e. g., the Many-Body Tensor Representation. Properties of extensive and non-extensive models for the atomization energy are systematically explored by training on small molecules and testing on small, medium and large molecules. Our results show that non-extensive models are only useful in the size-range of their training set, whereas extensive models provide reasonable predictions across large size differences. Remaining sources of error for extensive models are discussed.

[1]  Matthias Rupp,et al.  Machine learning for quantum mechanics in a nutshell , 2015 .

[2]  Noam Bernstein,et al.  Machine learning unifies the modeling of materials and molecules , 2017, Science Advances.

[3]  L. Curtiss,et al.  Gaussian‐1 theory: A general procedure for prediction of molecular energies , 1989 .

[4]  Klaus-Robert Müller,et al.  Many-Body Descriptors for Predicting Molecular Properties with Machine Learning: Analysis of Pairwise and Three-Body Interactions in Molecules. , 2018, Journal of chemical theory and computation.

[5]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[8]  J. Behler,et al.  Self-Diffusion of Surface Defects at Copper–Water Interfaces , 2017 .

[9]  Karsten Reuter,et al.  Virtual Screening for High Carrier Mobility in Organic Semiconductors. , 2016, The journal of physical chemistry letters.

[10]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[11]  Volker L. Deringer,et al.  Machine learning based interatomic potential for amorphous carbon , 2016, 1611.03277.

[12]  Karsten Reuter,et al.  Knowledge discovery through chemical space networks: the case of organic electronics , 2019, Journal of Molecular Modeling.

[13]  Burke,et al.  Generalized Gradient Approximation Made Simple. , 1996, Physical review letters.

[14]  Anders S. Christensen,et al.  Alchemical and structural distribution based representation for universal quantum machine learning. , 2017, The Journal of chemical physics.

[15]  Johannes T. Margraf,et al.  Systematic Enumeration of Elementary Reaction Steps in Surface Catalysis , 2019, ACS omega.

[16]  Matthias Scheffler,et al.  Ab initio molecular simulations with numeric atom-centered orbitals , 2009, Comput. Phys. Commun..

[17]  Joonhee Kang,et al.  First-principles database driven computational neural network approach to the discovery of active ternary nanocatalysts for oxygen reduction reaction. , 2018, Physical chemistry chemical physics : PCCP.

[18]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[19]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[20]  A. Tkatchenko,et al.  Accurate molecular van der Waals interactions from ground-state electron density and free-atom reference data. , 2009, Physical review letters.

[21]  Klaus-Robert Müller,et al.  Machine learning of accurate energy-conserving molecular force fields , 2016, Science Advances.

[22]  Chris Beeler,et al.  Extensive deep neural networks for transferring small scale learning to large scale systems , 2017, Chemical science.

[23]  M. Rupp,et al.  Chemical diversity in molecular orbital energy predictions with kernel ridge regression. , 2018, The Journal of chemical physics.

[24]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[25]  Raghunathan Ramakrishnan,et al.  Many Molecular Properties from One Kernel in Chemical Space. , 2015, Chimia.

[26]  Klaus-Robert Müller,et al.  Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. , 2013, Journal of chemical theory and computation.

[27]  D. Truhlar,et al.  Quest for a universal density functional: the accuracy of density functionals across a broad spectrum of databases in chemistry and physics , 2012, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[28]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[29]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[30]  O. A. von Lilienfeld,et al.  Electronic spectra from TDDFT and machine learning in chemical space. , 2015, The Journal of chemical physics.

[31]  Amir Karton,et al.  W4‐17: A diverse and high‐confidence dataset of atomization energies for benchmarking high‐level electronic structure methods , 2017, J. Comput. Chem..

[32]  Johannes T. Margraf,et al.  Finding the Right Bricks for Molecular Legos: A Data Mining Approach to Organic Semiconductor Design , 2019, Chemistry of Materials.

[33]  Johannes T. Margraf,et al.  Automatic generation of reaction energy databases from highly accurate atomization energy benchmark sets. , 2017, Physical chemistry chemical physics : PCCP.

[34]  Klaus-Robert Müller,et al.  Capturing intensive and extensive DFT/TDDFT molecular properties with machine learning , 2018 .