Metadynamics for training neural network model chemistries: A competitive assessment.

Neural network model chemistries (NNMCs) promise to facilitate the accurate exploration of chemical space and simulation of large reactive systems. One important path to improving these models is to add layers of physical detail, especially long-range forces. At short range, however, these models are data driven and data limited. Little is systematically known about how data should be sampled, and "test data" chosen randomly from some sampling techniques can provide poor information about generality. If the sampling method is narrow, "test error" can appear encouragingly tiny while the model fails catastrophically elsewhere. In this manuscript, we competitively evaluate two common sampling methods: molecular dynamics (MD), normal-mode sampling, and one uncommon alternative, Metadynamics (MetaMD), for preparing training geometries. We show that MD is an inefficient sampling method in the sense that additional samples do not improve generality. We also show that MetaMD is easily implemented in any NNMC software package with cost that scales linearly with the number of atoms in a sample molecule. MetaMD is a black-box way to ensure samples always reach out to new regions of chemical space, while remaining relevant to chemistry near kbT. It is a cheap tool to address the issue of generalization.

[1]  Nongnuch Artrith,et al.  High-dimensional neural-network potentials for multicomponent systems: Applications to zinc oxide , 2011 .

[2]  Li Li,et al.  Understanding Machine-learned Density Functionals , 2014, ArXiv.

[3]  E. Weinan,et al.  Deep Potential: a general representation of a many-body potential energy surface , 2017, 1707.01478.

[4]  L. Verlet Computer "Experiments" on Classical Fluids. I. Thermodynamical Properties of Lennard-Jones Molecules , 1967 .

[5]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[6]  J. Behler First Principles Neural Network Potentials for Reactive Simulations of Large Molecular and Condensed Systems. , 2017, Angewandte Chemie.

[7]  O. Anatole von Lilienfeld,et al.  Modeling electronic quantum transport with machine learning , 2014, 1401.8277.

[8]  Jörg Behler,et al.  Accurate Neural Network Description of Surface Phonons in Reactive Gas–Surface Dynamics: N2 + Ru(0001) , 2017, The journal of physical chemistry letters.

[9]  Volker L. Deringer,et al.  Machine learning based interatomic potential for amorphous carbon , 2016, 1611.03277.

[10]  Koichi Yamashita,et al.  Fitting sparse multidimensional data with low-dimensional terms , 2009, Comput. Phys. Commun..

[11]  Jörg Behler,et al.  Constructing high‐dimensional neural network potentials: A tutorial review , 2015 .

[12]  M. Ceriotti,et al.  Ab initio study of the diffusion and decomposition pathways of SiHx species on Si(100) , 2009 .

[13]  Yoshiyuki Kawazoe,et al.  Ab Initio Investigation of O-H Dissociation from the Al-OH2 Complex Using Molecular Dynamics and Neural Network Fitting. , 2016, The journal of physical chemistry. A.

[14]  J. Behler Neural network potential-energy surfaces in chemistry: a tool for large-scale simulations. , 2011, Physical chemistry chemical physics : PCCP.

[15]  Corey Oses,et al.  Materials Cartography: Representing and Mining Material Space Using Structural and Electronic Fingerprints , 2014, 1412.4096.

[16]  A. Nemukhin,et al.  Free Energy Barriers for the N-Terminal Asparagine to Succinimide Conversion: Quantum Molecular Dynamics Simulations for the Fully Solvated Model. , 2010, Journal of chemical theory and computation.

[17]  A. Rodríguez‐Fortea,et al.  First-Principles Molecular Dynamics Study of the Heterogeneous Reduction of NO2 on Soot Surfaces , 2008 .

[18]  A. Urakawa,et al.  Towards a rational design of ruthenium CO2 hydrogenation catalysts by Ab initio metadynamics. , 2007, Chemistry.

[19]  K. Ariga,et al.  Tautomerism in Reduced Pyrazinacenes. , 2010, Journal of chemical theory and computation.

[20]  Francesco Paesani,et al.  Molecular Origin of the Vibrational Structure of Ice Ih. , 2017, The journal of physical chemistry letters.

[21]  H. Häkkinen,et al.  First-principles simulations of hydrogen peroxide formation catalyzed by small neutral gold clusters. , 2009, Physical chemistry chemical physics : PCCP.

[22]  Raghunathan Ramakrishnan,et al.  Genetic Optimization of Training Sets for Improved Machine Learning Models of Molecular Properties. , 2016, The journal of physical chemistry letters.

[23]  Alán Aspuru-Guzik,et al.  Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry – the Harvard Clean Energy Project , 2014 .

[24]  Li Li,et al.  Bypassing the Kohn-Sham equations with machine learning , 2016, Nature Communications.

[25]  T. Morawietz,et al.  How van der Waals interactions determine the unique properties of water , 2016, Proceedings of the National Academy of Sciences.

[26]  A. W. Götz,et al.  Toward chemical accuracy in the description of ion-water interactions through many-body representations. Alkali-water dimer potential energy surfaces. , 2017, The Journal of chemical physics.

[27]  Zhi-Pan Liu,et al.  Stochastic Surface Walking Method for Structure Prediction and Pathway Searching. , 2013, Journal of chemical theory and computation.

[28]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[29]  Krishnan Raghavachari,et al.  Gaussian-3 theory using reduced Mo/ller-Plesset order , 1999 .

[30]  Kun Yao,et al.  Kinetic Energy of Hydrocarbons as a Function of Electron Density and Convolutional Neural Networks. , 2015, Journal of chemical theory and computation.

[31]  Alán Aspuru-Guzik,et al.  Accelerated computational discovery of high-performance materials for organic photovoltaics by means of cheminformatics , 2011 .

[32]  M. Rupp,et al.  Machine Learning for Quantum Mechanical Properties of Atoms in Molecules , 2015, 1505.00350.

[33]  Klaus-Robert Müller,et al.  Finding Density Functionals with Machine Learning , 2011, Physical review letters.

[34]  David W Toth,et al.  The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics , 2017, Chemical science.

[35]  Krishnan Raghavachari,et al.  Gaussian-2 theory for molecular energies of first- and second-row compounds , 1991 .

[36]  K. Houk,et al.  Molecular dynamics prediction of the mechanism of ester hydrolysis in water. , 2008, Journal of the American Chemical Society.

[37]  Andreas W Götz,et al.  On the accuracy of the MB-pol many-body potential for water: Interaction energies, vibrational frequencies, and classical thermodynamic and dynamical properties from clusters to liquid water and ice. , 2016, The Journal of chemical physics.

[38]  L. Curtiss,et al.  Gaussian-4 theory. , 2007, The Journal of chemical physics.

[39]  H. C. Andersen Molecular dynamics simulations at constant pressure and/or temperature , 1980 .

[40]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[41]  Heather J Kulik,et al.  Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure-Property Relationships. , 2017, The journal of physical chemistry. A.

[42]  A. Laio,et al.  Predicting crystal structures: the Parrinello-Rahman method revisited. , 2002, Physical review letters.

[43]  Zhi-Pan Liu,et al.  Stochastic surface walking method for crystal structure and phase transition pathway prediction. , 2014, Physical chemistry chemical physics : PCCP.

[44]  John C. Snyder,et al.  Orbital-free bond breaking via machine learning. , 2013, The Journal of chemical physics.

[45]  Michael Gastegger,et al.  Machine learning molecular dynamics for the simulation of infrared spectra† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02267k , 2017, Chemical science.

[46]  Jirí Cerný,et al.  Benchmark database of accurate (MP2 and CCSD(T) complete basis set limit) interaction energies of small model complexes, DNA base pairs, and amino acid pairs. , 2006, Physical chemistry chemical physics : PCCP.

[47]  Sergei Manzhos,et al.  Neural network‐based approaches for building high dimensional and quantum dynamics‐friendly potential energy surfaces , 2015 .

[48]  Klaus-Robert Müller,et al.  Machine learning of accurate energy-conserving molecular force fields , 2016, Science Advances.

[49]  Andreas W Götz,et al.  On the representation of many-body interactions in water. , 2015, The Journal of chemical physics.

[50]  Andrea Grisafi,et al.  Symmetry-Adapted Machine Learning for Tensorial Properties of Atomistic Systems. , 2017, Physical review letters.

[51]  A. Laio,et al.  Efficient reconstruction of complex free energy landscapes by multiple walkers metadynamics. , 2006, The journal of physical chemistry. B.

[52]  John E Herr,et al.  The many-body expansion combined with neural networks. , 2016, The Journal of chemical physics.

[53]  Alán Aspuru-Guzik,et al.  Advances in molecular quantum chemistry contained in the Q-Chem 4 program package , 2014, Molecular Physics.

[54]  A. Laio,et al.  Escaping free-energy minima , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[55]  A. Laio,et al.  Equilibrium free energies from nonequilibrium metadynamics. , 2006, Physical Review Letters.

[56]  Michele Parrinello,et al.  Enhancing Important Fluctuations: Rare Events and Metadynamics from a Conceptual Viewpoint. , 2016, Annual review of physical chemistry.

[57]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[58]  Heather J Kulik,et al.  Predicting electronic structure properties of transition metal complexes with neural networks† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc01247k , 2017, Chemical science.

[59]  Volodymyr Babin,et al.  A Critical Assessment of Two-Body and Three-Body Interactions in Water. , 2012, Journal of chemical theory and computation.

[60]  A. Urakawa,et al.  Conformational behavior of cinchonidine revisited: a combined theoretical and experimental study. , 2008, The journal of physical chemistry. A.

[61]  Preston Moore,et al.  Metadynamics as a tool for exploring free energy landscapes of chemical reactions. , 2006, Accounts of chemical research.

[62]  John E. Herr,et al.  Intrinsic Bond Energies from a Bonds-in-Molecules Neural Network. , 2017, The journal of physical chemistry letters.

[63]  Si-Da Huang,et al.  Material discovery by combining stochastic surface walking global optimization with a neural network† †Electronic supplementary information (ESI) available: Derivation for the gradient of J σ with respect to NN parameters. DFT calculation setups. Parameters of atom-centered symmetry functions for ge , 2017, Chemical science.

[64]  Kristof T. Schütt,et al.  How to represent crystal structures for machine learning: Towards fast prediction of electronic properties , 2013, 1307.1266.

[65]  Michele Parrinello,et al.  Nucleation mechanism for the direct graphite-to-diamond phase transition. , 2011, Nature materials.

[66]  Luke E K Achenie,et al.  Machine-Learning-Augmented Chemisorption Model for CO2 Electroreduction Catalyst Screening. , 2015, The journal of physical chemistry letters.

[67]  Peter C. Jurs,et al.  Prediction of Autoignition Temperatures of Organic Compounds from Molecular Structure , 1997, J. Chem. Inf. Comput. Sci..

[68]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[69]  A. Rodríguez‐Fortea,et al.  Theoretical analysis of the possible intermediates in the formation of [W6O19]2– , 2009 .

[70]  M. Parrinello,et al.  Crystal structure transformations in SiO2 from classical and ab initio metadynamics , 2006, Nature materials.

[71]  A. Laio,et al.  Assessing the accuracy of metadynamics. , 2005, The journal of physical chemistry. B.

[72]  J. Behler Atom-centered symmetry functions for constructing high-dimensional neural network potentials. , 2011, The Journal of chemical physics.

[73]  Olexandr Isayev,et al.  ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules , 2017, Scientific Data.

[74]  Sergei Manzhos,et al.  A random-sampling high dimensional model representation neural network for building potential energy surfaces. , 2006, The Journal of chemical physics.

[75]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[76]  Alán Aspuru-Guzik,et al.  The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid , 2011 .

[77]  Andreas Verras,et al.  Is Multitask Deep Learning Practical for Pharma? , 2017, J. Chem. Inf. Model..

[78]  Sanguthevar Rajasekaran,et al.  Accelerating materials property predictions using machine learning , 2013, Scientific Reports.

[79]  S. Grimme,et al.  "Mindless" DFT Benchmarking. , 2009, Journal of chemical theory and computation.

[80]  M. Head‐Gordon,et al.  Long-range corrected hybrid density functionals with damped atom-atom dispersion corrections. , 2008, Physical chemistry chemical physics : PCCP.

[81]  Li Li,et al.  Understanding Kernel Ridge Regression: Common behaviors from simple functions to density functionals , 2015, ArXiv.