Machine learning unifies the modeling of materials and molecules

Statistical learning based on a local representation of atomic structures provides a universal model of chemical stability. Determining the stability of molecules and condensed phases is the cornerstone of atomistic modeling, underpinning our understanding of chemical and materials properties and transformations. We show that a machine-learning model, based on a local description of chemical environments and Bayesian statistical learning, provides a unified framework to predict atomic-scale properties. It captures the quantum mechanical effects governing the complex surface reconstructions of silicon, predicts the stability of different classes of molecules with chemical accuracy, and distinguishes active and inactive protein ligands with more than 99% reliability. The universality and the systematic nature of our framework provide new insight into the potential energy surface of materials and molecules.

[1]  R. Wolkow,et al.  Direct observation of an increase in buckled dimers on Si(001) at low temperature. , 1992, Physical review letters.

[2]  Marco Cuturi Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances , 2013, 1306.0895.

[3]  J. Pople,et al.  Self‐consistent molecular orbital methods. XX. A basis set for correlated wave functions , 1980 .

[4]  Weber,et al.  Computer simulation of local order in condensed phases of silicon. , 1985, Physical review. B, Condensed matter.

[5]  Markus J Buehler,et al.  Multiparadigm modeling of dynamical crack propagation in silicon using a reactive force field. , 2006, Physical review letters.

[6]  Matthias Rupp,et al.  Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. , 2015, Journal of chemical theory and computation.

[7]  Gábor Csányi,et al.  Comparing molecules and solids across structural and alchemical space. , 2015, Physical chemistry chemical physics : PCCP.

[8]  Michele Parrinello,et al.  Demonstrating the Transferability and the Descriptive Power of Sketch-Map. , 2013, Journal of chemical theory and computation.

[9]  Junmei Wang,et al.  Development and testing of a general amber force field , 2004, J. Comput. Chem..

[10]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[11]  J. Tersoff,et al.  Empirical interatomic potential for silicon with improved elastic properties. , 1988, Physical review. B, Condensed matter.

[12]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[13]  Gábor Csányi,et al.  Accuracy and transferability of Gaussian approximation potential models for tungsten , 2014 .

[14]  Haoyan Huo,et al.  Unified Representation for Machine Learning of Molecules and Crystals , 2017 .

[15]  Markus Schneider,et al.  First-principles data set of 45,892 isolated and cation-coordinated conformers of 20 proteinogenic amino acids , 2015, Scientific Data.

[16]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[17]  Jackson,et al.  Atoms, molecules, solids, and surfaces: Applications of the generalized gradient approximation for exchange and correlation. , 1992, Physical review. B, Condensed matter.

[18]  David Hoksza,et al.  Benchmarking platform for ligand-based virtual screening , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[19]  Giulia Galli,et al.  Quantum Monte Carlo calculations of nanostructure optical gaps: application to silicon quantum dots. , 2002, Physical review letters.

[20]  Alexander V. Shapeev,et al.  Moment Tensor Potentials: A Class of Systematically Improvable Interatomic Potentials , 2015, Multiscale Model. Simul..

[21]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[22]  P. Hohenberg,et al.  Inhomogeneous Electron Gas , 1964 .

[23]  Zhenwei Li,et al.  Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces. , 2015, Physical review letters.

[24]  Larson,et al.  Ab initio theory of the Si(111)-(7 x 7) surface reconstruction: A challenge for massively parallel computation. , 1992, Physical review letters.

[25]  W. Kohn,et al.  Self-Consistent Equations Including Exchange and Correlation Effects , 1965 .

[26]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[27]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[28]  Felix A Faber,et al.  Machine Learning Energies of 2 Million Elpasolite (ABC_{2}D_{6}) Crystals. , 2015, Physical review letters.

[29]  R. Stevens,et al.  The 2.6 Angstrom Crystal Structure of a Human A2A Adenosine Receptor Bound to an Antagonist , 2008, Science.

[30]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[31]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[32]  Lucy J. Colwell,et al.  Predicting protein–ligand affinity with a random matrix framework , 2016, Proceedings of the National Academy of Sciences.

[33]  Nongnuch Artrith,et al.  High-dimensional neural-network potentials for multicomponent systems: Applications to zinc oxide , 2011 .

[34]  Louis G. Birta,et al.  Modelling and Simulation , 2013, Simulation Foundations, Methods and Applications.

[35]  Gerhard Klebe,et al.  Comparison of Automatic Three-Dimensional Model Builders Using 639 X-ray Structures , 1994, J. Chem. Inf. Comput. Sci..

[36]  Matthieu Montes,et al.  Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives , 2015, J. Chem. Inf. Model..

[37]  S. Solares,et al.  Density Functional Theory Study of the Geometry, Energetics, and Reconstruction Process of Si(111) Surfaces , 2005 .

[38]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[39]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[40]  Matthias Scheffler,et al.  Defect formation energies without the band-gap problem: combining density-functional theory and the GW approach for the silicon self-interstitial. , 2008, Physical review letters.

[41]  Markus Schneider,et al.  Assessing the Accuracy of Across-the-Scale Methods for Predicting Carbohydrate Conformational Energies for the Examples of Glucose and α-Maltose. , 2016, Journal of chemical theory and computation.

[42]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[43]  Michele Parrinello,et al.  Simplifying the representation of complex free-energy landscapes using sketch-map , 2011, Proceedings of the National Academy of Sciences.

[44]  J. Behler,et al.  Metadynamics simulations of the high-pressure phases of silicon employing a high-dimensional neural network potential. , 2008, Physical review letters.

[45]  A. Becke Density-functional thermochemistry. III. The role of exact exchange , 1993 .

[46]  M. Frisch,et al.  Ab Initio Calculation of Vibrational Absorption and Circular Dichroism Spectra Using Density Functional Force Fields , 1994 .

[47]  J. Irwin,et al.  Benchmarking sets for molecular docking. , 2006, Journal of medicinal chemistry.

[48]  Arthur F. Voter,et al.  Highly optimized empirical potential model of silicon , 2000 .

[49]  M. Rupp,et al.  Machine learning of molecular electronic properties in chemical compound space , 2013, 1305.7074.

[50]  O. A. von Lilienfeld,et al.  Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity. , 2016, The Journal of chemical physics.

[51]  Heinrich Rohrer,et al.  7 × 7 Reconstruction on Si(111) Resolved in Real Space , 1983 .

[52]  R. Martin,et al.  Electronic Structure: Basic Theory and Practical Methods , 2004 .

[53]  Matthias Rupp,et al.  Unified representation of molecules and crystals for machine learning , 2017, Mach. Learn. Sci. Technol..

[54]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[55]  L. Curtiss,et al.  Gaussian-3 (G3) theory for molecules containing first and second-row atoms , 1998 .

[56]  A. Szabo,et al.  Modern quantum chemistry , 1982 .

[57]  Car,et al.  Unified approach for molecular dynamics and density-functional theory. , 1985, Physical review letters.

[58]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[59]  Richard M. Martin Electronic Structure: Frontmatter , 2004 .

[60]  Klaus-Robert Müller,et al.  Finding Density Functionals with Machine Learning , 2011, Physical review letters.

[61]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[62]  Volker L. Deringer,et al.  Machine learning based interatomic potential for amorphous carbon , 2016, 1611.03277.

[63]  A. V. Duin,et al.  ReaxFF: A Reactive Force Field for Hydrocarbons , 2001 .

[64]  Matt Probert,et al.  First principles methods using CASTEP , 2005 .

[65]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[66]  James J. P. Stewart,et al.  Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parameters , 2012, Journal of Molecular Modeling.

[67]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.