A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials

A very active area of materials research is to devise methods that use machine learning to automatically extract predictive models from existing materials data. While prior examples have demonstrated successful models for some applications, many more applications exist where machine learning can make a strong impact. To enable faster development of machine-learning-based models for such applications, we have created a framework capable of being applied to a broad range of materials data. Our method works by using a chemically diverse list of attributes, which we demonstrate are suitable for describing a wide variety of properties, and a novel method for partitioning the data set into groups of similar materials in order to boost the predictive accuracy. In this manuscript, we demonstrate how this new method can be used to predict diverse properties of crystalline and amorphous materials, such as band gap energy and glass-forming ability.

[1]  J. Vybíral,et al.  Big data of materials science: critical role of the descriptor. , 2014, Physical review letters.

[2]  Manuela Pavan,et al.  DRAGON SOFTWARE: AN EASY APPROACH TO MOLECULAR DESCRIPTOR CALCULATIONS , 2006 .

[3]  Muratahan Aykol,et al.  The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies , 2015 .

[4]  Anubhav Jain,et al.  Carbonophosphates: A New Family of Cathode Materials for Li-Ion Batteries Identified Computationally , 2012 .

[5]  Rahul Malik,et al.  Cumulative Author Index , 1999, Powder Diffraction.

[6]  Gerbrand Ceder,et al.  Predicting crystal structure by merging data mining with quantum mechanics , 2006, Nature materials.

[7]  Liping Yu,et al.  Prediction and accelerated laboratory discovery of previously unknown 18-electron ABX compounds. , 2014, Nature chemistry.

[8]  Stefano Curtarolo,et al.  A search model for topological insulators with high-throughput robustness descriptors. , 2012, Nature materials.

[9]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Krishna Rajan,et al.  “Property Phase Diagrams” for Compound Semiconductors through Data Mining , 2013, Materials.

[12]  James Green,et al.  ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins , 2015, BMC Bioinformatics.

[13]  Krishna Rajan,et al.  Materials Informatics: The Materials ``Gene'' and Big Data , 2015 .

[14]  Marco Buongiorno Nardelli,et al.  AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations , 2012 .

[15]  Surya R. Kalidindi,et al.  Materials Data Science: Current Status and Future Outlook , 2015 .

[16]  Weihua Wang,et al.  Bulk metallic glasses , 2004 .

[17]  Geoffroy Hautier,et al.  Data mining approaches to high-throughput crystal structure and compound prediction. , 2014, Topics in current chemistry.

[18]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[19]  Alexander Tropsha,et al.  Materials Informatics , 2019, J. Chem. Inf. Model..

[20]  Anubhav Jain,et al.  A high-throughput infrastructure for density functional theory calculations , 2011 .

[21]  H. K. D. H. Bhadeshia,et al.  Performance of neural networks in materials science , 2009 .

[22]  Muratahan Aykol,et al.  Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD) , 2013 .

[23]  Alán Aspuru-Guzik,et al.  Prediction and Calculation of Crystal Structures: Methods and Applications , 2014 .

[24]  H. Queisser,et al.  Detailed Balance Limit of Efficiency of p‐n Junction Solar Cells , 1961 .

[25]  Jian Xu,et al.  Formation of Bulk Metallic Glasses and Their Composites , 2007 .

[26]  Ryan O'Hayre,et al.  Predicting Density Functional Theory Total Energies and Enthalpies of Formation of Metal—Nonmetal Compounds by Linear Regression , 2016 .

[27]  Wei Luo,et al.  Information-Theoretic Approach for the Discovery of Design Rules for Crystal Chemistry , 2012, J. Chem. Inf. Model..

[28]  Phillip B. Messersmith,et al.  Bioinspired antifouling polymers , 2005 .

[29]  Xiaoqun Wu,et al.  Artificial neural network aided design of catalyst for propane ammoxidation , 1997 .

[30]  Tao Zhang,et al.  Formation and High Mechanical Strength of Bulk Glassy Alloys in Zr-Al-Co-Cu System , 2002 .

[31]  Atsuto Seko,et al.  Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single- and binary-component solids , 2013, 1310.1546.

[32]  H. K. D. H. Bhadeshia,et al.  δ TRIP steel , 2007 .

[33]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[34]  Marco Buongiorno Nardelli,et al.  The high-throughput highway to computational materials design. , 2013, Nature materials.

[35]  Anubhav Jain,et al.  Data mined ionic substitutions for the discovery of new compounds. , 2011, Inorganic chemistry.

[36]  Kristin A. Persson,et al.  Predicting crystal structures with data mining of quantum calculations. , 2003, Physical review letters.

[37]  Sergei V. Kalinin,et al.  Big-deep-smart data in imaging for guiding materials design. , 2015, Nature materials.

[38]  William D. Callister,et al.  Materials Science and Engineering: An Introduction , 1985 .

[39]  Lusann Yang,et al.  Data-mined similarity function between material compositions , 2013 .

[40]  Mark C. Lonergan,et al.  Solution phase n-doping of C60and PCBM using tetrabutylammonium fluoride , 2014 .

[41]  W. W. Wright,et al.  Materials science and engineering. An introduction 2nd Edition W. D. Callister, Jr John Wiley & Sons, New York, 1991. pp. xxi + 791, price E53.00. ISBN 0‐471‐50488‐2 , 1993 .

[42]  G. Pilania,et al.  Machine learning bandgaps of double perovskites , 2016, Scientific Reports.

[43]  A. Choudhary,et al.  Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science , 2016 .

[44]  Christopher M Wolverton,et al.  Dissolving the Periodic Table in Cubic Zirconia: Data Mining to Discover Chemical Trends , 2014 .

[45]  A. Tsai,et al.  Nonequilibrium phase diagrams of ternary amorphous alloys , 1997 .

[46]  Shuichi Iwata,et al.  Data-Driven Atomic Environment Prediction for Binaries Using the Mendeleev Number. Part 1. Composition AB. , 2004 .

[47]  A. Inoue Stabilization of metallic supercooled liquid and bulk amorphous alloys , 2000 .

[48]  Kristof T. Schütt,et al.  How to represent crystal structures for machine learning: Towards fast prediction of electronic properties , 2013, 1307.1266.

[49]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[50]  Rahul Malik,et al.  Spinel compounds as multivalent battery cathodes: A systematic evaluation based on ab initio calculations , 2014 .

[51]  Sean Paradiso,et al.  Perspective: Materials informatics across the product lifecycle: Selection, manufacturing, and certification , 2016 .

[52]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[53]  D. W. Noid,et al.  On the Design, Analysis, and Characterization of Materials Using Computational Neural Networks , 1996 .

[54]  Bryce Meredig,et al.  Data mining our way to the next generation of thermoelectrics , 2016 .

[55]  Alok Choudhary,et al.  Combinatorial screening for new materials in unconstrained composition space with machine learning , 2014 .

[56]  Somnath Datta,et al.  Informatics-aided bandgap engineering for solar materials , 2014 .

[57]  Jan Schroers,et al.  Combinatorial development of bulk metallic glasses. , 2014, Nature materials.

[58]  Felix A Faber,et al.  Crystal structure representations for machine learning models of formation energies , 2015, 1503.07406.

[59]  Christopher M Wolverton,et al.  High‐Throughput Computational Screening of New Li‐Ion Battery Anode Materials , 2013 .

[60]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[61]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[62]  N. Ashcroft,et al.  Vegard's law. , 1991, Physical review. A, Atomic, molecular, and optical physics.

[63]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[64]  P. Luksch,et al.  New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design. , 2002, Acta crystallographica. Section B, Structural science.

[65]  Atsuto Seko,et al.  Sparse representation for a potential energy surface , 2014, 1403.7995.

[66]  Edward O. Pyzer-Knapp,et al.  A Bayesian Approach to Calibrating High-Throughput Virtual Screening Results and Application to Organic Photovoltaic Materials , 2015, 1510.00388.

[67]  Alan R. Bishop,et al.  Perspective: Codesign for materials science: An optimal learning approach , 2016 .

[68]  Sanguthevar Rajasekaran,et al.  Accelerating materials property predictions using machine learning , 2013, Scientific Reports.