Training neural nets to learn reactive potential energy surfaces using interactive quantum chemistry in virtual reality

While the primary bottleneck to a number of computational workflows was not so long ago limited by processing power, the rise of machine learning technologies has resulted in an interesting paradigm shift, which places increasing value on issues related to data curation-that is, data size, quality, bias, format, and coverage. Increasingly, data-related issues are equally as important as the algorithmic methods used to process and learn from the data. Here we introduce an open-source graphics processing unit-accelerated neural network (NN) framework for learning reactive potential energy surfaces (PESs). To obtain training data for this NN framework, we investigate the use of real-time interactive ab initio molecular dynamics in virtual reality (iMD-VR) as a new data curation strategy that enables human users to rapidly sample geometries along reaction pathways. Focusing on hydrogen abstraction reactions of CN radical with isopentane, we compare the performance of NNs trained using iMD-VR data versus NNs trained using a more traditional method, namely, molecular dynamics (MD) constrained to sample a predefined grid of points along the hydrogen abstraction reaction coordinate. Both the NN trained using iMD-VR data and the NN trained using the constrained MD data reproduce important qualitative features of the reactive PESs, such as a low and early barrier to abstraction. Quantitative analysis shows that NN learning is sensitive to the data set used for training. Our results show that user-sampled structures obtained with the quantum chemical iMD-VR machinery enable excellent sampling in the vicinity of the minimum energy path (MEP). As a result, the NN trained on the iMD-VR data does very well predicting energies that are close to the MEP but less well predicting energies for "off-path" structures. The NN trained on the constrained MD data does better predicting high-energy off-path structures, given that it included a number of such structures in its training set.

[1]  F. Weigend Accurate Coulomb-fitting basis sets for H to Rn. , 2006, Physical chemistry chemical physics : PCCP.

[2]  Stewart K. Reed,et al.  Molecular dynamics study to identify the reactive sites of a liquid squalane surface. , 2006, The journal of physical chemistry. B.

[3]  David W Toth,et al.  The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics , 2017, Chemical science.

[4]  H. Rabitz,et al.  Reproducing kernel Hilbert space interpolation methods as a paradigm of high dimensional model representations: Application to multidimensional potential energy surface construction , 2003 .

[5]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[6]  A. Pukrittayakamee,et al.  Simultaneous fitting of a potential-energy surface and its corresponding force fields using feedforward neural networks. , 2009, The Journal of chemical physics.

[7]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[8]  Jun Li,et al.  Permutation invariant polynomial neural network approach to fitting potential energy surfaces. II. Four-atom systems. , 2013, The Journal of chemical physics.

[9]  Gilles Louppe,et al.  Independent consultant , 2013 .

[10]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[11]  Matthias Rupp,et al.  Machine learning for quantum mechanics in a nutshell , 2015 .

[12]  Pearu Peterson,et al.  F2PY: a tool for connecting Fortran and Python programs , 2009, Int. J. Comput. Sci. Eng..

[13]  Jörg Behler,et al.  Neural network potential-energy surfaces for atomistic simulations , 2010 .

[14]  Moritz P. Haag,et al.  Studying chemical reactivity in a virtual environment. , 2014, Faraday discussions.

[15]  Klaus-Robert Müller,et al.  Machine learning of accurate energy-conserving molecular force fields , 2016, Science Advances.

[16]  Adrien Treuille,et al.  Predicting protein structures with a multiplayer online game , 2010, Nature.

[17]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[18]  Rafael C. Bernardi,et al.  Enhanced sampling techniques in molecular dynamics simulations of biological systems. , 2015, Biochimica et biophysica acta.

[19]  Noam Bernstein,et al.  Machine learning unifies the modeling of materials and molecules , 2017, Science Advances.

[20]  M. Towrie,et al.  Vibrational relaxation and microsolvation of DF after F-atom reactions in polar solvents , 2015, Science.

[21]  Marwin H. S. Segler,et al.  Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. , 2017, Chemistry.

[22]  M. Parrinello,et al.  Canonical sampling through velocity rescaling. , 2007, The Journal of chemical physics.

[23]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[24]  Elvira Guàrdia,et al.  Potential of mean force by constrained molecular dynamics: A sodium chloride ion-pair in water , 1991 .

[25]  Frederick R. Manby,et al.  Fast Hartree–Fock theory using local density fitting approximations , 2004 .

[26]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[27]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[28]  Zoran Popović,et al.  Determining crystal structures through crowdsourcing and coursework , 2016, Nature Communications.

[29]  F. Weigend,et al.  Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. , 2005, Physical chemistry chemical physics : PCCP.

[30]  Michael Towrie,et al.  Vibrationally Quantum-State–Specific Reaction Dynamics of H Atom Abstraction by CN Radical in Solution , 2011, Science.

[31]  J. Cavanaugh Unifying the derivations for the Akaike and corrected Akaike information criteria , 1997 .

[32]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[33]  Xin Xu,et al.  Communication: an accurate global potential energy surface for the OH + CO → H + CO2 reaction using neural networks. , 2013, The Journal of chemical physics.

[34]  Michael A. Collins,et al.  Molecular Potential Energy Surfaces by Interpolation , 1994, International Conference on Computational Science.

[35]  Roland Lindh,et al.  The reduced multiplication scheme of the Rys quadrature and new recurrence relations for auxiliary function based two‐electron integral evaluation , 1991 .

[36]  Moritz P. Haag,et al.  Real‐time quantum chemistry , 2012, 1208.3717.

[37]  Wayne P. Hess,et al.  Kinetic study of the reactions of CN with ethane and propane , 1989 .

[38]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[39]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[40]  Arieh Warshel,et al.  An empirical valence bond approach for comparing reactions in solutions and in enzymes , 1980 .

[41]  Markus Reiher,et al.  Semiempirical molecular orbital models based on the neglect of diatomic differential overlap approximation , 2018, International Journal of Quantum Chemistry.

[42]  Joel M. Bowman,et al.  Permutationally invariant potential energy surfaces in high dimensionality , 2009 .

[43]  Adam Liwo,et al.  WeFold: A coopetition for protein structure prediction , 2014, Proteins.

[44]  Jörg Behler,et al.  Constructing high‐dimensional neural network potentials: A tutorial review , 2015 .

[45]  H. Akaike A new look at the statistical model identification , 1974 .

[46]  Tommaso Calarco,et al.  Remote optimization of an ultracold atoms experiment by experts and citizen scientists , 2017, Proceedings of the National Academy of Sciences.

[47]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[48]  David Salesin,et al.  The challenge of designing scientific discovery games , 2010, FDG.

[49]  Burke,et al.  Generalized Gradient Approximation Made Simple. , 1996, Physical review letters.

[50]  Marcus D. Hanwell,et al.  Avogadro: an advanced semantic chemical editor, visualization, and analysis platform , 2012, Journal of Cheminformatics.

[51]  Markus Reiher,et al.  Real‐time feedback from iterative electronic structure calculations , 2015, J. Comput. Chem..

[52]  M. Costen,et al.  Site and bond-specific dynamics of reactions at the gas-liquid interface. , 2014, Physical chemistry chemical physics : PCCP.

[53]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[54]  Petra Schneider,et al.  Generative Recurrent Networks for De Novo Drug Design , 2017, Molecular informatics.

[55]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[56]  David Baker,et al.  Algorithm discovery by protein folding game players , 2011, Proceedings of the National Academy of Sciences.

[57]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[58]  Roland Lindh,et al.  The reduced multiplication scheme of the Rys-Gauss quadrature for 1st order integral derivatives , 1993 .

[59]  Robert T. McGibbon,et al.  Osprey: Hyperparameter Optimization for Machine Learning , 2016, J. Open Source Softw..

[60]  David R Glowacki,et al.  Non-equilibrium reaction and relaxation dynamics in a strongly interacting explicit solvent: F + CD3CN treated with a parallel multi-state EVB model. , 2014, The Journal of chemical physics.

[61]  O. Anatole von Lilienfeld,et al.  The "DNA" of chemistry: Scalable quantum machine learning with "amons" , 2017, 1707.04146.

[62]  Joost VandeVondele,et al.  cp2k: atomistic simulations of condensed matter systems , 2014 .

[63]  Oussama Metatla,et al.  Sampling molecular conformations and dynamics in a multiuser virtual reality framework , 2018, Science Advances.

[64]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[65]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[66]  K. Burke,et al.  Generalized Gradient Approximation Made Simple [Phys. Rev. Lett. 77, 3865 (1996)] , 1997 .

[67]  Hao Wu,et al.  VAMPnets for deep learning of molecular kinetics , 2017, Nature Communications.

[68]  Travis E. Oliphant,et al.  Guide to NumPy , 2015 .

[69]  Chris Morley,et al.  Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit , 2008, Chemistry Central journal.

[70]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[71]  Alán Aspuru-Guzik,et al.  Accelerated computational discovery of high-performance materials for organic photovoltaics by means of cheminformatics , 2011 .

[72]  Jií Kolafa,et al.  Time‐reversible always stable predictor–corrector method for molecular dynamics of polarizable molecules , 2004, J. Comput. Chem..

[73]  Alireza Khorshidi,et al.  Amp: A modular approach to machine learning in atomistic simulations , 2016, Comput. Phys. Commun..

[74]  David R Glowacki,et al.  Prediction of enhanced solvent-induced enantioselectivity for a ring opening with a bifurcating reaction path. , 2015, Physical chemistry chemical physics : PCCP.

[75]  Gábor Csányi,et al.  Comparing molecules and solids across structural and alchemical space. , 2015, Physical chemistry chemical physics : PCCP.

[76]  Michael A. Collins,et al.  Molecular potential-energy surfaces for chemical reaction dynamics , 2002 .

[77]  Joost VandeVondele,et al.  Sparse matrix multiplication: The distributed block-compressed sparse row library , 2014, Parallel Comput..

[78]  J. Behler Atom-centered symmetry functions for constructing high-dimensional neural network potentials. , 2011, The Journal of chemical physics.

[79]  J. Stewart Optimization of parameters for semiempirical methods V: Modification of NDDO approximations and application to 70 elements , 2007, Journal of molecular modeling.

[80]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[81]  V. Barone,et al.  Toward reliable density functional methods without adjustable parameters: The PBE0 model , 1999 .

[82]  David R Glowacki,et al.  Product energy deposition of CN + alkane H abstraction reactions in gas and solution phases. , 2011, The Journal of chemical physics.

[83]  Martin Schütz,et al.  Molpro: a general‐purpose quantum chemistry program package , 2012 .

[84]  T. Frauenheim,et al.  DFTB+, a sparse matrix-based implementation of the DFTB method. , 2007, The journal of physical chemistry. A.

[85]  M C Payne,et al.  "Learn on the fly": a hybrid classical and quantum-mechanical molecular dynamics simulation. , 2004, Physical review letters.

[86]  Peter Pulay,et al.  Ab initio geometry optimization for large molecules , 1997, Journal of Computational Chemistry.

[87]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[88]  Luiz Moutinho,et al.  Neural Networks in Marketing: Modelling Consumer Responses to Advertising Stimuli , 1993 .

[89]  Geoffrey Zweig,et al.  Joint Language and Translation Modeling with Recurrent Neural Networks , 2013, EMNLP.