Automatic selection of atomic fingerprints and reference configurations for machine-learning potentials.

Machine learning of atomic-scale properties is revolutionizing molecular modeling, making it possible to evaluate inter-atomic potentials with first-principles accuracy, at a fraction of the costs. The accuracy, speed, and reliability of machine learning potentials, however, depend strongly on the way atomic configurations are represented, i.e., the choice of descriptors used as input for the machine learning method. The raw Cartesian coordinates are typically transformed in "fingerprints," or "symmetry functions," that are designed to encode, in addition to the structure, important properties of the potential energy surface like its invariances with respect to rotation, translation, and permutation of like atoms. Here we discuss automatic protocols to select a number of fingerprints out of a large pool of candidates, based on the correlations that are intrinsic to the training data. This procedure can greatly simplify the construction of neural network potentials that strike the best balance between accuracy and computational efficiency and has the potential to accelerate by orders of magnitude the evaluation of Gaussian approximation potentials based on the smooth overlap of atomic positions kernel. We present applications to the construction of neural network potentials for water and for an Al-Mg-Si alloy and to the prediction of the formation energies of small organic molecules using Gaussian process regression.

[1]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[2]  Frederick R. Manby,et al.  Machine-learning approach for one- and two-body corrections to density functional theory: Applications to molecular and condensed water , 2013 .

[3]  Rustam Z. Khaliullin,et al.  Graphite-diamond phase coexistence study employing a neural-network mapping of the ab initio potential energy surface , 2010 .

[4]  Michele Ceriotti,et al.  Ab initio modelling of the early stages of precipitation in Al-6000 alloys , 2017, 1708.07908.

[5]  Jörg Behler,et al.  Nuclear Quantum Effects in Water at the Triple Point: Using Theory as a Link Between Experiments. , 2016, The journal of physical chemistry letters.

[6]  J. Behler Perspective: Machine learning potentials for atomistic simulations. , 2016, The Journal of chemical physics.

[7]  Andrea Grisafi,et al.  Symmetry-Adapted Machine Learning for Tensorial Properties of Atomistic Systems. , 2017, Physical review letters.

[8]  Chris Wolverton,et al.  First principles impurity diffusion coefficients , 2009 .

[9]  G. Giacomello,et al.  Proteins structure. , 1957, Scientia medica italica. English ed.

[10]  Jörg Behler,et al.  Constructing high‐dimensional neural network potentials: A tutorial review , 2015 .

[11]  Michael Gastegger,et al.  Machine learning molecular dynamics for the simulation of infrared spectra† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02267k , 2017, Chemical science.

[12]  Iosif I. Vaisman,et al.  Machine learning approach for structure-based zeolite classification , 2009 .

[13]  Bryce Meredig,et al.  Data mining our way to the next generation of thermoelectrics , 2016 .

[14]  Gábor Csányi,et al.  Comparing molecules and solids across structural and alchemical space. , 2015, Physical chemistry chemical physics : PCCP.

[15]  Michele Ceriotti,et al.  Recognizing Local and Global Structural Motifs at the Atomic Scale. , 2018, Journal of chemical theory and computation.

[16]  J. Behler First Principles Neural Network Potentials for Reactive Simulations of Large Molecular and Condensed Systems. , 2017, Angewandte Chemie.

[17]  Jörg Behler,et al.  Proton-Transfer Mechanisms at the Water-ZnO Interface: The Role of Presolvation. , 2017, The journal of physical chemistry letters.

[18]  Gábor Csányi,et al.  Accuracy and transferability of Gaussian approximation potential models for tungsten , 2014 .

[19]  Jörg Behler,et al.  Concentration-Dependent Proton Transfer Mechanisms in Aqueous NaOH Solutions: From Acceptor-Driven to Donor-Driven and Back. , 2016, The journal of physical chemistry letters.

[20]  J. Behler Atom-centered symmetry functions for constructing high-dimensional neural network potentials. , 2011, The Journal of chemical physics.

[21]  G. Henkelman,et al.  A climbing image nudged elastic band method for finding saddle points and minimum energy paths , 2000 .

[22]  J. Behler Neural network potential-energy surfaces in chemistry: a tool for large-scale simulations. , 2011, Physical chemistry chemical physics : PCCP.

[23]  Josh E. Campbell,et al.  Machine learning for the structure–energy–property landscapes of molecular crystals† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc04665k , 2017, Chemical science.

[24]  M. Rupp,et al.  Machine learning of molecular electronic properties in chemical compound space , 2013, 1305.7074.

[25]  T. Morawietz,et al.  How van der Waals interactions determine the unique properties of water , 2016, Proceedings of the National Academy of Sciences.

[26]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[27]  Rustam Z. Khaliullin,et al.  Microscopic origins of the anomalous melting behavior of sodium under high pressure. , 2011, Physical review letters.

[28]  J. Behler,et al.  Construction of high-dimensional neural network potentials using environment-dependent atom pairs. , 2012, The Journal of chemical physics.

[29]  G. Henkelman,et al.  A dimer method for finding saddle points on high dimensional potential surfaces using only first derivatives , 1999 .

[30]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[31]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[32]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[33]  Michele Parrinello,et al.  Demonstrating the Transferability and the Descriptive Power of Sketch-Map. , 2013, Journal of chemical theory and computation.

[34]  Peter Sollich,et al.  Accurate interatomic force fields via machine learning with covariant kernels , 2016, 1611.03877.

[35]  Ryo Kobayashi,et al.  Neural network potential for Al-Mg-Si alloys , 2017 .

[36]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[37]  Arun Mannodi-Kanakkithodi,et al.  Accelerated materials property predictions and design using motif-based fingerprints , 2015, 1503.07503.

[38]  Stefano de Gironcoli,et al.  QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials , 2009, Journal of physics. Condensed matter : an Institute of Physics journal.

[40]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[41]  Volker L. Deringer,et al.  Machine learning based interatomic potential for amorphous carbon , 2016, 1611.03277.

[42]  Pierre Baldi,et al.  A machine learning information retrieval approach to protein fold recognition. , 2006, Bioinformatics.

[43]  Raghunathan Ramakrishnan,et al.  Genetic Optimization of Training Sets for Improved Machine Learning Models of Molecular Properties. , 2016, The journal of physical chemistry letters.

[44]  Jörg Behler,et al.  High order path integrals made easy. , 2016, The Journal of chemical physics.

[45]  Michele Ceriotti,et al.  Mapping and classifying molecules from a high-throughput structural database , 2016, Journal of Cheminformatics.

[46]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[47]  Ali Sadeghi,et al.  A fingerprint based metric for measuring similarities of crystalline structures. , 2015, The Journal of chemical physics.

[48]  Gábor Csányi,et al.  Gaussian approximation potentials: A brief tutorial introduction , 2015, 1502.01366.

[49]  Germany,et al.  Neural network interatomic potential for the phase change material GeTe , 2012, 1201.2026.

[50]  O. A. von Lilienfeld,et al.  Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity. , 2016, The Journal of chemical physics.

[51]  Nongnuch Artrith,et al.  High-dimensional neural network potentials for metal surfaces: A prototype study for copper , 2012 .

[52]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[53]  Noam Bernstein,et al.  Machine learning unifies the modeling of materials and molecules , 2017, Science Advances.

[54]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[55]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[56]  Maciej Haranczyk,et al.  Automatic Structure Analysis in High-Throughput Characterization of Porous Materials. , 2010, Journal of chemical theory and computation.

[57]  Jörg Behler,et al.  Representing the potential-energy surface of protonated water clusters by high-dimensional neural network potentials. , 2015, Physical chemistry chemical physics : PCCP.

[58]  M Gastegger,et al.  wACSF-Weighted atom-centered symmetry functions as descriptors in machine learning potentials. , 2017, The Journal of chemical physics.