A Data-Driven Construction of the Periodic Table of the Elements

Machine-learning of atomic-scale properties amounts to extracting correlations between structure, composition and the quantity that one wants to predict. Representing the input structure in a way that best reflects such correlations makes it possible to improve the accuracy of the model for a given amount of reference data. When using a description of the structures that is transparent and well-principled, optimizing the representation might reveal insights into the chemistry of the data set. Here we show how one can generalize the SOAP kernel to introduce a distance-dependent weight that accounts for the multi-scale nature of the interactions, and a description of correlations between chemical species. We show that this improves substantially the performance of ML models of molecular and materials stability, while making it easier to work with complex, multi-component systems and to extend SOAP to coarse-grained intermolecular potentials. The element correlations that give the best performing model show striking similarities with the conventional periodic table of the elements, providing an inspiring example of how machine learning can rediscover, and generalize, intuitive concepts that constitute the foundations of chemistry.

[1]  Y. X. Wang,et al.  Nuclear Instruments and Methods in Physics Research Section B : Beam Interactions with Materials and Atoms , 2018 .

[2]  Justin S. Smith,et al.  Hierarchical modeling of molecular energies using a deep neural network. , 2017, The Journal of chemical physics.

[3]  Gerbrand Ceder,et al.  Efficient and accurate machine-learning interpolation of atomic energies in compositions with many species , 2017, 1706.06293.

[4]  James Barker,et al.  LC-GAP: Localized Coulomb Descriptors for the Gaussian Approximation Potential , 2016, Scientific Computing and Algorithms in Industrial Simulations.

[5]  M Gastegger,et al.  wACSF-Weighted atom-centered symmetry functions as descriptors in machine learning potentials. , 2017, The Journal of chemical physics.

[6]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[7]  Josh E. Campbell,et al.  Machine learning for the structure–energy–property landscapes of molecular crystals† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc04665k , 2017, Chemical science.

[8]  Jörg Behler,et al.  Comparison of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials in representing water interactions through many-body expansions. , 2018, The Journal of chemical physics.

[9]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[10]  Masashi Tsubaki,et al.  Fast and Accurate Molecular Property Prediction: Learning Atomic Interactions and Potentials with Neural Networks. , 2018, The journal of physical chemistry letters.

[11]  Matthias Rupp,et al.  Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. , 2015, Journal of chemical theory and computation.

[12]  Peter Sollich,et al.  Accurate interatomic force fields via machine learning with covariant kernels , 2016, 1611.03877.

[13]  O. A. V. Lilienfeld,et al.  First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties , 2012, 1209.5033.

[14]  O. A. von Lilienfeld,et al.  Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity. , 2016, The Journal of chemical physics.

[15]  Volker L. Deringer,et al.  Machine learning based interatomic potential for amorphous carbon , 2016, 1611.03277.

[16]  Aldo Glielmo,et al.  Efficient nonparametric n -body force fields from machine learning , 2018, 1801.04823.

[17]  Andrea Grisafi,et al.  Symmetry-Adapted Machine Learning for Tensorial Properties of Atomistic Systems. , 2017, Physical review letters.

[18]  O. Anatole von Lilienfeld,et al.  Quantum Machine Learning in Chemical Compound Space , 2018 .

[19]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[20]  R. A. Vargas-Hernández,et al.  Assessing Gaussian Process Regression and Permutationally Invariant Polynomial Approaches To Represent High-Dimensional Potential Energy Surfaces. , 2018, Journal of chemical theory and computation.

[21]  Anders S. Christensen,et al.  Alchemical and structural distribution based representation for universal quantum machine learning. , 2017, The Journal of chemical physics.

[22]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[23]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[24]  Noam Bernstein,et al.  Machine learning unifies the modeling of materials and molecules , 2017, Science Advances.

[25]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[26]  Srishti D. Chatterji,et al.  Positive Definite Kernels , 1983 .

[27]  Michele Ceriotti,et al.  Mapping and classifying molecules from a high-throughput structural database , 2016, Journal of Cheminformatics.

[28]  Gabor Csanyi,et al.  Achieving DFT accuracy with a machine-learning interatomic potential: thermomechanics and defects in bcc ferromagnetic iron , 2017, 1706.10229.

[29]  J. Behler Atom-centered symmetry functions for constructing high-dimensional neural network potentials. , 2011, The Journal of chemical physics.

[30]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[31]  Shou-Cheng Zhang,et al.  Learning atoms for materials discovery , 2018, Proceedings of the National Academy of Sciences.

[32]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[33]  Pablo Tamayo,et al.  Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies , 2014, Scientific Data.

[34]  Felix A Faber,et al.  Machine Learning Energies of 2 Million Elpasolite (ABC_{2}D_{6}) Crystals. , 2015, Physical review letters.

[35]  Marco Cuturi Positive Definite Kernels in Machine Learning , 2009, 0911.5367.

[36]  Ajeevsing Bholoa,et al.  A new approach to potential fitting using neural networks , 2007 .

[37]  E Weinan,et al.  Deep Potential Molecular Dynamics: a scalable model with the accuracy of quantum mechanics , 2017, Physical review letters.

[38]  Klaus-Robert Müller,et al.  Machine learning of accurate energy-conserving molecular force fields , 2016, Science Advances.

[39]  Gábor Csányi,et al.  Comparing molecules and solids across structural and alchemical space. , 2015, Physical chemistry chemical physics : PCCP.

[40]  Frederick R. Manby,et al.  Machine-learning approach for one- and two-body corrections to density functional theory: Applications to molecular and condensed water , 2013 .

[41]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[42]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[43]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.