The structural information filtered features (SIFF) potential: Maximizing information stored in machine-learning descriptors for materials prediction

Machine learning inspired potentials continue to improve the ability for predicting structures of materials. However, many challenges still exist, particularly when calculating structures of disordered systems. These challenges are primarily due to the rapidly increasing dimensionality of the feature-vector space which in most machine-learning algorithms is dependent on the size of the structure. In this article, we present a feature-engineered approach that establishes a set of principles for representing potentials of physical structures (crystals, molecules, and clusters) in a feature space rather than a physically motivated space. Our goal in this work is to define guiding principles that optimize information storage of the physical parameters within the feature representations. In this manner, we focus on keeping the dimensionality of the feature space independent of the number of atoms in the structure. Our Structural Information Filtered Features (SIFF) potential represents structures by utilizing a feature vector of low-correlated descriptors, which correspondingly maximizes information within the descriptor. We present results of our SIFF potential on datasets composed of disordered (carbon and carbon–oxygen) clusters, molecules with C7O2H2 stoichiometry in the GDB9-14B dataset, and crystal structures of the form (AlxGayInz)2O3 as proposed in the NOMAD Kaggle competition. Our potential's performance is at least comparable, sometimes significantly more accurate, and often more efficient than other well-known machine-learning potentials for structure prediction. However, primarily, we offer a different perspective on how researchers should consider opportunities in maximizing information storage for features.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[3]  Rustam Z. Khaliullin,et al.  Graphite-diamond phase coexistence study employing a neural-network mapping of the ab initio potential energy surface , 2010 .

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  Alexander A. Demkov,et al.  Advances and applications in the FIREBALL ab initio tight‐binding molecular‐dynamics formalism , 2011 .

[6]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[7]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[8]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[9]  J Behler,et al.  Representing potential energy surfaces by high-dimensional neural network potentials , 2014, Journal of physics. Condensed matter : an Institute of Physics journal.

[10]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[11]  Zhenwei Li,et al.  Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces. , 2015, Physical review letters.

[12]  Gábor Csányi,et al.  Gaussian approximation potentials: A brief tutorial introduction , 2015, 1502.01366.

[13]  J. Vybíral,et al.  Big data of materials science: critical role of the descriptor. , 2014, Physical review letters.

[14]  Sergei V. Kalinin,et al.  Big-deep-smart data in imaging for guiding materials design. , 2015, Nature materials.

[15]  O. A. von Lilienfeld,et al.  Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity. , 2016, The Journal of chemical physics.

[16]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17]  Nongnuch Artrith,et al.  An implementation of artificial neural-network potentials for atomistic materials simulations: Performance for TiO2 , 2016 .

[18]  Guillermo Avendaño-Franco,et al.  Firefly Algorithm for Structural Search. , 2016, Journal of chemical theory and computation.

[19]  Timothy J. Giese,et al.  Quantum mechanical force fields for condensed phase molecular simulations , 2017, Journal of physics. Condensed matter : an Institute of Physics journal.

[20]  Chiho Kim,et al.  Machine learning in materials informatics: recent applications and prospects , 2017, npj Computational Materials.

[21]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[22]  Volker L. Deringer,et al.  Machine learning based interatomic potential for amorphous carbon , 2016, 1611.03277.

[23]  Noam Bernstein,et al.  Machine learning unifies the modeling of materials and molecules , 2017, Science Advances.

[24]  Samad Hajinazar,et al.  Stratified construction of neural network based interatomic models for multicomponent materials , 2016, 1609.08455.

[25]  Gerbrand Ceder,et al.  Efficient and accurate machine-learning interpolation of atomic energies in compositions with many species , 2017, 1706.06293.

[26]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[27]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[28]  Klaus-Robert Müller,et al.  Many-Body Descriptors for Predicting Molecular Properties with Machine Learning: Analysis of Pairwise and Three-Body Interactions in Molecules. , 2018, Journal of chemical theory and computation.

[29]  Jun Li,et al.  Atomic Energies from a Convolutional Neural Network. , 2018, Journal of chemical theory and computation.

[30]  K. Müller,et al.  Towards exact molecular dynamics simulations with machine-learned force fields , 2018, Nature Communications.

[31]  Kevin M. Ryan,et al.  Crystal Structure Prediction via Deep Learning. , 2018, Journal of the American Chemical Society.

[32]  Adam S. Foster,et al.  Machine learning hydrogen adsorption on nanoclusters through structural descriptors , 2018, npj Computational Materials.

[33]  Joost VandeVondele,et al.  Machine Learning Adaptive Basis Sets for Efficient Large Scale Density Functional Theory Simulation , 2018, Journal of chemical theory and computation.

[34]  Vladan Stevanović,et al.  Physical descriptor for the Gibbs energy of inorganic crystalline solids and temperature-dependent materials chemistry , 2018, Nature Communications.

[35]  Jeffrey C Grossman,et al.  Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. , 2017, Physical review letters.

[36]  Georg Kresse,et al.  On-the-fly machine learning force field generation: Application to melting points , 2019, Physical Review B.

[37]  Yinchang Zhao,et al.  Machine Learning-Aided Design of Materials with Target Elastic Properties , 2019, The Journal of Physical Chemistry C.