On the role of gradients for machine learning of molecular energies and forces

The accuracy of any machine learning potential can only be as good as the data used in the fitting process. The most efficient model therefore selects the training data that will yield the highest accuracy compared to the cost of obtaining the training data. We investigate the convergence of prediction errors of quantum machine learning models for organic molecules trained on energy and force labels, two common data types in molecular simulations. When training and predicting on different geometries corresponding to the same single molecule, we find that the inclusion of atomic forces in the training data increases the accuracy of the predicted energies and forces 7-fold, compared to models trained on energy only. Surprisingly, for models trained on sets of organic molecules of varying size and composition in non-equilibrium conformations, inclusion of forces in the training does not improve the predicted energies of unseen molecules in new conformations. Predicted forces, however, also improve about 7-fold. For the systems studied, we find that force labels and energy labels contribute equally per label to the convergence of the prediction errors. Choosing to include derivatives such as atomic forces in the training set or not should thus depend on, not only on the computational cost of acquiring the force labels for training, but also on the application domain, the property of interest, and the desirable size of the machine learning model. Based on our observations we describe key considerations for the creation of datasets for potential energy surfaces of molecules which maximize the efficiency of the resulting machine learning models.

[1]  K. Müller,et al.  Towards exact molecular dynamics simulations with machine-learned force fields , 2018, Nature Communications.

[2]  Rampi Ramprasad,et al.  Learning scheme to predict atomic forces and accelerate materials simulations , 2015, 1505.02701.

[3]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[4]  Anders S. Christensen,et al.  Alchemical and structural distribution based representation for universal quantum machine learning. , 2017, The Journal of chemical physics.

[5]  Anders S Christensen,et al.  FCHL revisited: Faster and more accurate quantum machine learning. , 2020, The Journal of chemical physics.

[6]  K-R Müller,et al.  SchNetPack: A Deep Learning Toolbox For Atomistic Systems. , 2018, Journal of chemical theory and computation.

[7]  Justin S. Smith,et al.  The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules , 2020, Scientific Data.

[8]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[9]  E Weinan,et al.  Deep Potential Molecular Dynamics: a scalable model with the accuracy of quantum mechanics , 2017, Physical review letters.

[10]  Lawrence D. Jackel,et al.  Learning Curves: Asymptotic Values and Rate of Convergence , 1993, NIPS.

[11]  Rampi Ramprasad,et al.  Adaptive machine learning framework to accelerate ab initio molecular dynamics , 2015 .

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Olexandr Isayev,et al.  ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules , 2017, Scientific Data.

[14]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[15]  Burke,et al.  Generalized Gradient Approximation Made Simple. , 1996, Physical review letters.

[16]  Ursula Rothlisberger,et al.  Variational particle number approach for rational compound design. , 2005, Physical review letters.

[17]  Christian Trott,et al.  Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials , 2014, J. Comput. Phys..

[18]  Konstantin Gubaev,et al.  Machine learning of molecular properties: Locality and active learning. , 2017, The Journal of chemical physics.

[19]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[20]  Klaus Schulten,et al.  A Numerical Study on Learning Curves in Stochastic Multilayer Feedforward Networks , 1996, Neural Computation.

[21]  Christoph Ortner,et al.  Incompleteness of Atomic Structure Representations. , 2020, Physical review letters.

[22]  Rampi Ramprasad,et al.  A universal strategy for the creation of machine learning-based atomistic force fields , 2017, npj Computational Materials.

[23]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[24]  O. A. von Lilienfeld,et al.  Rapid and accurate molecular deprotonation energies from quantum alchemy. , 2019, Physical chemistry chemical physics : PCCP.

[25]  Aldo Glielmo,et al.  Efficient nonparametric n -body force fields from machine learning , 2018, 1801.04823.

[26]  Peter Sollich,et al.  Accurate interatomic force fields via machine learning with covariant kernels , 2016, 1611.03877.

[27]  Guido Falk von Rudorff,et al.  Alchemical perturbation density functional theory , 2018 .

[28]  O. A. V. Lilienfeld,et al.  First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties , 2012, 1209.5033.

[29]  Frank Neese,et al.  Software update: the ORCA program system, version 4.0 , 2018 .

[30]  F. Weigend,et al.  Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. , 2005, Physical chemistry chemical physics : PCCP.

[31]  Rampi Ramprasad,et al.  Machine Learning Force Fields: Construction, Validation, and Outlook , 2016, 1610.02098.

[32]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[33]  Markus Meuwly,et al.  PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments, and Partial Charges. , 2019, Journal of chemical theory and computation.

[34]  Noam Bernstein,et al.  Exploration, Sampling, And Reconstruction of Free Energy Surfaces with Gaussian Process Regression. , 2016, Journal of chemical theory and computation.

[35]  Gábor Csányi,et al.  Gaussian approximation potentials: A brief tutorial introduction , 2015, 1502.01366.

[36]  Adrian E. Roitberg,et al.  Less is more: sampling chemical space with active learning , 2018, The Journal of chemical physics.

[37]  O Anatole von Lilienfeld,et al.  Quantum Machine Learning in Chemical Compound Space. , 2018, Angewandte Chemie.

[38]  Klaus-Robert Müller,et al.  Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. , 2013, Journal of chemical theory and computation.

[39]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[40]  David Mautner Himmelblau,et al.  Applied Nonlinear Programming , 1972 .

[41]  M. Head‐Gordon,et al.  Long-range corrected hybrid density functionals with damped atom-atom dispersion corrections. , 2008, Physical chemistry chemical physics : PCCP.

[42]  Zhenwei Li,et al.  Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces. , 2015, Physical review letters.

[43]  O. A. von Lilienfeld,et al.  Alchemical perturbation density functional theory , 2018, 1809.01647.

[44]  A. Gross,et al.  Representing high-dimensional potential-energy surfaces for reactions at surfaces by neural networks , 2004 .

[45]  James Barker,et al.  LC-GAP: Localized Coulomb Descriptors for the Gaussian Approximation Potential , 2016, Scientific Computing and Algorithms in Industrial Simulations.

[46]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[47]  Klaus-Robert Müller,et al.  Machine learning of accurate energy-conserving molecular force fields , 2016, Science Advances.

[48]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[49]  J. Behler Perspective: Machine learning potentials for atomistic simulations. , 2016, The Journal of chemical physics.

[50]  Anders S. Christensen,et al.  Operators in quantum machine learning: Response properties in chemical space. , 2018, The Journal of chemical physics.

[51]  Andrea Grisafi,et al.  Symmetry-Adapted Machine Learning for Tensorial Properties of Atomistic Systems. , 2017, Physical review letters.

[52]  Justin S. Smith,et al.  Hierarchical modeling of molecular energies using a deep neural network. , 2017, The Journal of chemical physics.