Machine learning in materials informatics: recent applications and prospects

Propelled partly by the Materials Genome Initiative, and partly by the algorithmic developments and the resounding successes of data-driven efforts in other domains, informatics strategies are beginning to take shape within materials science. These approaches lead to surrogate machine learning models that enable rapid predictions based purely on past data rather than by direct experimentation or by computations/simulations in which fundamental equations are explicitly solved. Data-centric informatics methods are becoming useful to determine material properties that are hard to measure or compute using traditional methods—due to the cost, time or effort involved—but for which reliable data either already exists or can be generated for at least a subset of the critical cases. Predictions are typically interpolative, involving fingerprinting a material numerically first, and then following a mapping (established via a learning algorithm) between the fingerprint and the property of interest. Fingerprints, also referred to as “descriptors”, may be of many types and scales, as dictated by the application domain and needs. Predictions may also be extrapolative—extending into new materials spaces—provided prediction uncertainties are properly taken into account. This article attempts to provide an overview of some of the recent successful data-driven “materials informatics” strategies undertaken in the last decade, with particular emphasis on the fingerprint or descriptor choices. The review also identifies some challenges the community is facing and those that should be overcome in the near future.

[1]  James B. Adams,et al.  Interatomic Potentials from First-Principles Calculations: The Force-Matching Method , 1993, cond-mat/9306054.

[2]  J. Kermode,et al.  Modelling defects in Ni–Al with EAM and DFT calculations , 2016 .

[3]  Chiho Kim,et al.  A polymer dataset for accelerated property prediction and design , 2016, Scientific Data.

[4]  Alex Zunger,et al.  First-Principles Statistical Mechanics of Semiconductor Alloys and Intermetallic Compounds , 1994 .

[5]  Gerald W. R. Ward,et al.  The Grove Encyclopedia of Materials and Techniques in Art , 2008 .

[6]  Roy E. Welsch,et al.  Descriptors of Oxygen-Evolution Activity for Oxides: A Statistical Evaluation , 2016 .

[7]  Gábor Csányi,et al.  Accuracy and transferability of Gaussian approximation potential models for tungsten , 2014 .

[8]  S. Theodoridis Bayesian Learning: Approximate Inference and Nonparametric Models , 2020, Machine Learning.

[9]  Surya R. Kalidindi,et al.  Role of materials data science and informatics in accelerated materials innovation , 2016 .

[10]  Tom K Woo,et al.  Rapid and Accurate Machine Learning Recognition of High Performing Metal Organic Frameworks for CO2 Capture. , 2014, The journal of physical chemistry letters.

[11]  J. Hogden,et al.  Statistical inference and adaptive design for materials discovery , 2017 .

[12]  Jianjun Hu,et al.  Semi-Supervised Approach to Phase Identification from Combinatorial Sample Diffraction Patterns , 2016 .

[13]  Michael F. Lynch,et al.  An Evaluation of a Substructure Search Screen System Based on Bond-Centered Fragments. , 1974 .

[14]  Surya R. Kalidindi,et al.  Computationally Efficient, Fully Coupled Multiscale Modeling of Materials Phenomena Using Calibrated Localization Linkages , 2012 .

[15]  Jitesh H. Panchal,et al.  Key computational modeling issues in Integrated Computational Materials Engineering , 2013, Comput. Aided Des..

[16]  N. Petch,et al.  The influence of grain boundary carbide and grain size on the cleavage strength and impact transition temperature of steel , 1986 .

[17]  David B. Brough,et al.  Microstructure-based knowledge systems for capturing process-structure evolution linkages. , 2017, Acta materialia.

[18]  Zheng Li,et al.  Feature engineering of machine-learning chemisorption models for catalyst design , 2017 .

[19]  Gábor Csányi,et al.  Gaussian approximation potentials: A brief tutorial introduction , 2015, 1502.01366.

[20]  David B. Brough,et al.  Extraction of Process-Structure Evolution Linkages from X-ray Scattering Measurements Using Dimensionality Reduction and Time Series Analysis , 2017, Integrating Materials and Manufacturing Innovation.

[21]  James E. Gubernatis,et al.  Structure classification and melting temperature prediction in octet AB solids via machine learning , 2015 .

[22]  H. K. D. H. Bhadeshia,et al.  δ TRIP steel , 2007 .

[23]  Alok Choudhary,et al.  A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials , 2016 .

[24]  David B. Brough,et al.  Materials Knowledge Systems in Python—a Data Science Framework for Accelerated Development of Hierarchical Materials , 2017, Integrating Materials and Manufacturing Innovation.

[25]  Tim Mueller,et al.  Exact expressions for structure selection in cluster expansions , 2010 .

[26]  Ryan O'Hayre,et al.  Predicting density functional theory total energies and enthalpies of formation of metal-nonmetal compounds by linear regression , 2016 .

[27]  P Perdikaris,et al.  Multi-fidelity modelling via recursive co-kriging and Gaussian–Markov random fields , 2015, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[28]  D. Fontaine Cluster Approach to Order-Disorder Transformations in Alloys , 1994 .

[29]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[30]  Rampi Ramprasad,et al.  A study of adatom ripening on an Al (1 1 1) surface with machine learning force fields , 2016, 1610.04684.

[31]  S. Broderick,et al.  Computational discovery of stable M 2 A X phases , 2016 .

[32]  J. Vybíral,et al.  Big data of materials science: critical role of the descriptor. , 2014, Physical review letters.

[33]  Shweta Jindal,et al.  Spherical harmonics based descriptor for neural network potentials: Structure and dynamics of Au147 nanocluster. , 2017, The Journal of chemical physics.

[34]  J. Behler,et al.  Metadynamics simulations of the high-pressure phases of silicon employing a high-dimensional neural network potential. , 2008, Physical review letters.

[35]  Chiho Kim,et al.  Finding New Perovskite Halides via Machine Learning , 2016, Front. Mater..

[36]  Matthias Rupp,et al.  Machine learning for quantum mechanics in a nutshell , 2015 .

[37]  Alok Choudhary,et al.  Combinatorial screening for new materials in unconstrained composition space with machine learning , 2014 .

[38]  Tim Mueller,et al.  Machine Learning in Materials Science , 2016 .

[39]  Warrren B Powell The Knowledge Gradient for Optimal Learning , 2011 .

[40]  Ghanshyam Pilania,et al.  Rational design of all organic polymer dielectrics , 2014, Nature Communications.

[41]  Tim Mueller,et al.  Bayesian approach to cluster expansions , 2009 .

[42]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[43]  Klaus-Robert Müller,et al.  Machine learning of accurate energy-conserving molecular force fields , 2016, Science Advances.

[44]  Anubhav Jain,et al.  Finding Nature’s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory , 2010 .

[45]  Gábor Csányi,et al.  Comparing molecules and solids across structural and alchemical space. , 2015, Physical chemistry chemical physics : PCCP.

[46]  Ramamurthy Ramprasad,et al.  How critical are the van der Waals interactions in polymer crystals , 2012 .

[47]  T. Karzig,et al.  Exponential lifetime improvement in topological quantum memories , 2015, 1512.04528.

[48]  Ramamurthy Ramprasad,et al.  How critical are the van der Waals interactions in polymer crystals? , 2012, The journal of physical chemistry. A.

[49]  Sergios Theodoridis,et al.  Machine Learning: A Bayesian and Optimization Perspective , 2015 .

[50]  Arun Mannodi-Kanakkithodi,et al.  Mining materials design rules from data: The example of polymer dielectrics , 2017 .

[51]  W. Hume-rothery Atomic theory for students of metallurgy , 1947 .

[52]  Alexander I. J. Forrester,et al.  Multi-fidelity optimization via surrogate modelling , 2007, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[53]  S. Boggs,et al.  Advanced polymeric dielectrics for high energy density applications , 2016 .

[54]  E. Hall,et al.  The Deformation and Ageing of Mild Steel: III Discussion of Results , 1951 .

[55]  Arun Mannodi-Kanakkithodi,et al.  Machine Learning Strategy for Accelerated Design of Polymer Dielectrics , 2016, Scientific Reports.

[56]  Manh Cuong Nguyen,et al.  On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets , 2014, Scientific Reports.

[57]  Christian Trott,et al.  Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials , 2014, J. Comput. Phys..

[58]  Chiho Kim,et al.  From Organized High-Throughput Data to Phenomenological Theory using Machine Learning: The Example of Dielectric Breakdown , 2016 .

[59]  Ichiro Takeuchi,et al.  Fulfilling the promise of the materials genome initiative with high-throughput experimental methodologies , 2017 .

[60]  Stefano Curtarolo,et al.  How the Chemical Composition Alone Can Predict Vibrational Free Energies and Entropies of Solids , 2017, 1703.02309.

[61]  W. B. Pearson,et al.  Pearson's crystal data : crystal structure database for inorganic compounds , 2007 .

[62]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[63]  James Theiler,et al.  Accelerated search for materials with targeted properties by adaptive design , 2016, Nature Communications.

[64]  Christopher M Wolverton,et al.  High-Throughput Computational Screening of Perovskites for Thermochemical Water Splitting Applications , 2016 .

[65]  Alex Zunger,et al.  Searching for alloy configurations with target physical properties: impurity design via a genetic algorithm inverse band structure approach. , 2006, Physical review letters.

[66]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[67]  Warren B. Powell,et al.  The Knowledge Gradient Algorithm for a General Class of Online Learning Problems , 2012, Oper. Res..

[68]  Wei Chen,et al.  A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of k-nary Inorganic Polycrystalline Compounds , 2016, Scientific Reports.

[69]  Arun Mannodi-Kanakkithodi,et al.  Rational Co‐Design of Polymer Dielectrics for Energy Storage , 2016, Advanced materials.

[70]  Lance J. Nelson,et al.  Compressive sensing as a paradigm for building physics models , 2013 .

[71]  Rampi Ramprasad,et al.  Adaptive machine learning framework to accelerate ab initio molecular dynamics , 2015 .

[72]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[73]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[74]  Rampi Ramprasad,et al.  Learning scheme to predict atomic forces and accelerate materials simulations , 2015, 1505.02701.

[75]  Alison Gopnik,et al.  Making AI More Human. , 2017, Scientific American.

[76]  J. A. Bush,et al.  Method for relating the structure and properties of chemical compounds , 1974, Nature.

[77]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[78]  Krishna Rajan,et al.  Information Science for Materials Discovery and Design , 2016 .

[79]  Arun Mannodi-Kanakkithodi,et al.  A rational co-design approach to the creation of new dielectric polymers with high energy density , 2017, IEEE Transactions on Dielectrics and Electrical Insulation.

[80]  Czech Republic,et al.  Learning physical descriptors for materials science by compressed sensing , 2016, 1612.04285.

[81]  Somnath Datta,et al.  Informatics-aided bandgap engineering for solar materials , 2014 .

[82]  M. Boley,et al.  Uncovering structure-property relationships of materials by subgroup discovery , 2016, 1612.04307.

[83]  Klaus-Robert Müller,et al.  Finding Density Functionals with Machine Learning , 2011, Physical review letters.

[84]  Arun Mannodi-Kanakkithodi,et al.  Accelerated materials property predictions and design using motif-based fingerprints , 2015, 1503.07503.

[85]  Alyson G. Wilson,et al.  Use of Bayesian Inference in Crystallographic Structure Refinement via Full Diffraction Profile Analysis , 2016, Scientific Reports.

[86]  I Takeuchi,et al.  High-throughput determination of structural phase diagram and constituent phases using GRENDEL , 2015, Nanotechnology.

[87]  S. Srinivasan,et al.  India's legendary wootz steel: an advanced material of the ancient world , 2004 .

[88]  Ferreira,et al.  Efficient cluster expansion for substitutional systems. , 1992, Physical review. B, Condensed matter.

[89]  Surya R. Kalidindi,et al.  Structure–property linkages using a data science approach: Application to a non-metallic inclusion/steel composite system , 2015 .

[90]  Wei Chen,et al.  Predicting defect behavior in B2 intermetallics by merging ab initio modeling and machine learning , 2016, npj Computational Materials.

[91]  John C. Snyder,et al.  Orbital-free bond breaking via machine learning. , 2013, The Journal of chemical physics.

[92]  James E. Gubernatis,et al.  Multi-fidelity machine learning models for accurate bandgap predictions of solids , 2017 .

[93]  B. Efron,et al.  Bootstrap confidence levels for phylogenetic trees. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[94]  A. van de Walle,et al.  Automating First-Principles Phase Diagram Calculations , 2002 .

[95]  Atsuto Seko,et al.  Cluster expansion method for multicomponent systems based on optimal selection of structures for density-functional theory calculations , 2009 .

[96]  Engineering,et al.  Prediction model of band gap for inorganic compounds by combination of density functional theory calculations and machine learning techniques , 2016 .

[97]  B. Uberuaga,et al.  Using Machine Learning To Identify Factors That Govern Amorphization of Irradiated Pyrochlores , 2016, 1607.06789.

[98]  Chiho Kim,et al.  Machine Learning Assisted Predictions of Intrinsic Dielectric Breakdown Strength of ABX3 Perovskites , 2016 .

[99]  Christopher M Wolverton,et al.  Atomistic calculations and materials informatics: A review , 2017 .

[100]  Ridwan Sakidja,et al.  A genomic approach to the stability, elastic, and electronic properties of the MAX phases , 2014 .

[101]  Felix A Faber,et al.  Machine Learning Energies of 2 Million Elpasolite (ABC_{2}D_{6}) Crystals. , 2015, Physical review letters.

[102]  Axel van de Walle,et al.  Building effective models from sparse but precise data: Application to an alloy cluster expansion model , 2009, 0908.0659.

[103]  J Behler,et al.  Representing potential energy surfaces by high-dimensional neural network potentials , 2014, Journal of physics. Condensed matter : an Institute of Physics journal.

[104]  R. Feynman Forces in Molecules , 1939 .

[105]  G Scott,et al.  Properties of polymers. Their correlation with chemical structure; their numerical estimation and prediction from additive group contributions: 3rd Edn. By D. W. van Krevelen. Pp. 875. Elsevier, Amsterdam. 1990. US $337.25, Dfl 590.00 ISBN 0 444 88160 3 , 1992 .

[106]  Atsuto Seko,et al.  Prediction of Low-Thermal-Conductivity Compounds with First-Principles Anharmonic Lattice-Dynamics Calculations and Bayesian Optimization. , 2015, Physical review letters.

[107]  Surya R. Kalidindi,et al.  Application of data science tools to quantify and distinguish between structures and models in molecular dynamics datasets , 2015, Nanotechnology.

[108]  Zhenwei Li,et al.  Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces. , 2015, Physical review letters.

[109]  Warren B. Powell,et al.  Optimal Learning: Powell/Optimal , 2012 .

[110]  Volker L. Deringer,et al.  Machine learning based interatomic potential for amorphous carbon , 2016, 1611.03277.

[111]  Klaus-Robert Müller,et al.  Nonlinear gradient denoising: Finding accurate extrema from inaccurate functional derivatives , 2015 .

[112]  Rampi Ramprasad,et al.  The rational design of polyurea & polyurethane dielectric materials , 2013 .

[113]  P. Judson,et al.  Knowledge-Based Expert Systems in Chemistry , 2009 .

[114]  G. Pilania,et al.  Machine learning bandgaps of double perovskites , 2016, Scientific Reports.

[115]  F. Ducastelle,et al.  Generalized cluster description of multicomponent systems , 1984 .

[116]  D. W. Van Krevelen,et al.  Chapter 1 – Polymer Properties , 1997 .

[117]  Atsuto Seko,et al.  Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single- and binary-component solids , 2013, 1310.1546.

[118]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[119]  John M. Gregoire,et al.  Perspective: Composition–structure–property mapping in high-throughput experiments: Turning data into knowledge , 2016 .

[120]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[121]  Van Krevelen Properties of Polymers: Their Correlation with Chemical Structure; their Numerical Estimation and Prediction from Additive Group Contributions , 2009 .

[122]  Xavier Andrade,et al.  Compressed Sensing for the Fast Computation of Matrices: Application to Molecular Vibrations , 2015, ACS central science.

[123]  Taylor D. Sparks,et al.  High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds , 2016 .

[124]  Rampi Ramprasad,et al.  Machine Learning Force Fields: Construction, Validation, and Outlook , 2016, 1610.02098.

[125]  Sanguthevar Rajasekaran,et al.  Accelerating materials property predictions using machine learning , 2013, Scientific Reports.

[126]  Peter Sollich,et al.  Accurate interatomic force fields via machine learning with covariant kernels , 2016, 1611.03877.

[127]  Xiaoning Qian,et al.  Accelerated search for BaTiO3-based piezoelectrics with vertical morphotropic phase boundary using Bayesian learning , 2016, Proceedings of the National Academy of Sciences.