DScribe: Library of Descriptors for Machine Learning in Materials Science

DScribe is a software package for machine learning that provides popular feature transformations ("descriptors") for atomistic materials simulations. DScribe accelerates the application of machine learning for atomistic property prediction by providing user-friendly, off-the-shelf descriptor implementations. The package currently contains implementations for Coulomb matrix, Ewald sum matrix, sine matrix, Many-body Tensor Representation (MBTR), Atom-centered Symmetry Function (ACSF) and Smooth Overlap of Atomic Positions (SOAP). Usage of the package is illustrated for two different applications: formation energy prediction for solids and ionic charge prediction for atoms in organic molecules. The package is freely available under the open-source Apache License 2.0.

[1]  J. Vybíral,et al.  Big data of materials science: critical role of the descriptor. , 2014, Physical review letters.

[2]  Kyle Chard,et al.  Matminer: An open source toolkit for materials data mining , 2018, Computational Materials Science.

[3]  Jeffrey C Grossman,et al.  Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. , 2017, Physical review letters.

[4]  Mikkel N. Schmidt,et al.  Deep Learning Spectroscopy: Neural Networks for Molecular Excitation Spectra , 2019, Advanced science.

[5]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[6]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[7]  Osman G. Mamun,et al.  Prediction of Adsorption Energies for Chemical Species on Metal Catalyst Surfaces Using Machine Learning , 2018, The Journal of Physical Chemistry C.

[8]  Klaus-Robert Müller,et al.  Machine learning of accurate energy-conserving molecular force fields , 2016, Science Advances.

[9]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[10]  Klaus-Robert Müller,et al.  Many-Body Descriptors for Predicting Molecular Properties with Machine Learning: Analysis of Pairwise and Three-Body Interactions in Molecules. , 2018, Journal of chemical theory and computation.

[11]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[12]  Anubhav Jain,et al.  Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis , 2012 .

[13]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[14]  C. Catlow,et al.  Computer Simulation Studies of Zeolite Structure , 1988 .

[15]  Matthias Rupp,et al.  Unified representation of molecules and crystals for machine learning , 2017, Mach. Learn. Sci. Technol..

[16]  Michele Ceriotti,et al.  A Data-Driven Construction of the Periodic Table of the Elements , 2018, 1807.00236.

[17]  K. G. Thomas,et al.  Descriptor-Based Rational Design of Two-Dimensional Self-Assembled Nanoarchitectures Stabilized by Hydrogen Bonds , 2017 .

[18]  Luke E K Achenie,et al.  High-throughput screening of bimetallic catalysts enabled by machine learning , 2017 .

[19]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[20]  Sereina Riniker,et al.  Machine Learning of Partial Charges Derived from High-Quality Quantum-Mechanical Calculations , 2018, J. Chem. Inf. Model..

[21]  Ericka Stricklin-Parker,et al.  Ann , 2005 .

[22]  M Gastegger,et al.  wACSF-Weighted atom-centered symmetry functions as descriptors in machine learning potentials. , 2017, The Journal of chemical physics.

[23]  Nongnuch Artrith,et al.  An implementation of artificial neural-network potentials for atomistic materials simulations: Performance for TiO2 , 2016 .

[24]  Yuzuru Tanaka,et al.  Materials informatics: a journey towards material design and synthesis. , 2016, Dalton transactions.

[25]  Gábor Csányi,et al.  Comparing molecules and solids across structural and alchemical space. , 2015, Physical chemistry chemical physics : PCCP.

[26]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[27]  J. Behler Perspective: Machine learning potentials for atomistic simulations. , 2016, The Journal of chemical physics.

[28]  M. Rupp,et al.  Chemical diversity in molecular orbital energy predictions with kernel ridge regression. , 2018, The Journal of chemical physics.

[29]  Alireza Khorshidi,et al.  Amp: A modular approach to machine learning in atomistic simulations , 2016, Comput. Phys. Commun..

[30]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[31]  Alok Choudhary,et al.  Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations , 2017 .

[32]  J. E. Gubernatis,et al.  Machine learning in materials design and discovery: Examples from the present and suggestions for the future , 2018, Physical Review Materials.

[33]  J. Behler,et al.  Accurate Probabilities for Highly Activated Reaction of Polyatomic Molecules on Surfaces Using a High-Dimensional Neural Network Potential: CHD3 + Cu(111) , 2019, The journal of physical chemistry letters.

[34]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[35]  B. Hammer,et al.  On-the-Fly Machine Learning of Atomic Potential in Density Functional Theory Structure Optimization. , 2018, Physical review letters.

[36]  Jukka Corander,et al.  Bayesian inference of atomistic structure in functional materials , 2017, npj Computational Materials.

[37]  Corey Oses,et al.  Materials Cartography: Representing and Mining Material Space Using Structural and Electronic Fingerprints , 2014, 1412.4096.

[38]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[39]  Jörg Behler,et al.  Comparison of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials in representing water interactions through many-body expansions. , 2018, The Journal of chemical physics.

[40]  Lenka Zdeborová,et al.  New tool in the box , 2017, Nature Physics.

[41]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[42]  O. A. von Lilienfeld,et al.  Electronic spectra from TDDFT and machine learning in chemical space. , 2015, The Journal of chemical physics.

[43]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[44]  J. Board,et al.  Ewald summation techniques in perspective: a survey , 1996 .

[45]  H. J. Mclaughlin,et al.  Learn , 2002 .

[46]  Felix A Faber,et al.  Crystal structure representations for machine learning models of formation energies , 2015, 1503.07406.

[47]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[48]  Cormac Toher,et al.  Universal fragment descriptors for predicting properties of inorganic crystals , 2016, Nature Communications.

[49]  A Data-Driven Construction of the Periodic Table of the Elements , 2018 .

[50]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[51]  Christopher J. Bartel,et al.  Machine learning for heterogeneous catalyst design and discovery , 2018 .

[52]  Andreas Ziehe,et al.  Learning Invariant Representations of Molecules for Atomization Energy Prediction , 2012, NIPS.

[53]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[54]  Chi Chen,et al.  Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals , 2018, Chemistry of Materials.

[55]  Shin Kiyohara,et al.  Prediction of interface structures and energies via virtual screening , 2016, Science Advances.

[56]  P. Löwdin On the Non‐Orthogonality Problem Connected with the Use of Atomic Wave Functions in the Theory of Molecules and Crystals , 1950 .

[57]  Atsuto Seko,et al.  Representation of compounds for machine-learning prediction of physical properties , 2016, 1611.08645.

[58]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[59]  Muratahan Aykol,et al.  Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD) , 2013 .

[60]  Anders S. Christensen,et al.  Alchemical and structural distribution based representation for universal quantum machine learning. , 2017, The Journal of chemical physics.

[61]  J. Behler Atom-centered symmetry functions for constructing high-dimensional neural network potentials. , 2011, The Journal of chemical physics.

[62]  Adam S. Foster,et al.  Machine learning hydrogen adsorption on nanoclusters through structural descriptors , 2018, npj Computational Materials.

[63]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[64]  Helmut Grubmüller,et al.  Quantifying Artifacts in Ewald Simulations of Inhomogeneous Systems with a Net Charge. , 2014, Journal of chemical theory and computation.

[65]  Michele Parrinello,et al.  Simplifying the representation of complex free-energy landscapes using sketch-map , 2011, Proceedings of the National Academy of Sciences.

[66]  Felix A Faber,et al.  Machine Learning Energies of 2 Million Elpasolite (ABC_{2}D_{6}) Crystals. , 2015, Physical review letters.

[67]  Stefano Curtarolo,et al.  SISSO: A compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates , 2017, Physical Review Materials.

[68]  Bernard R Brooks,et al.  Machine Learning Force Field Parameters from Ab Initio Data. , 2017, Journal of chemical theory and computation.

[69]  P. P. Ewald Die Berechnung optischer und elektrostatischer Gitterpotentiale , 1921 .

[70]  Michael Walter,et al.  The atomic simulation environment-a Python library for working with atoms. , 2017, Journal of physics. Condensed matter : an Institute of Physics journal.

[71]  Yang Wang,et al.  Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning , 2019, Science.