Can machine learning find extraordinary materials?

One of the most common criticisms of machine learning is an assumed inability for models to extrapolate, i.e. to identify extraordinary materials with properties beyond those present in the training data set. To investigate whether this is indeed the case, this work takes advantage of density functional theory calculated properties (bulk modulus, shear modulus, thermal conductivity, thermal expansion, band gap and Debye temperature) to investigate whether machine learning is truly capable of predicting materials with properties that extend beyond previously seen values. We refer to these materials as extraordinary, meaning they represent the top 1% of values in the available data set. Interestingly, we show that even when machine learning is trained on a fraction of the bottom 99% we can consistently identify 3/4 of the highest performing compositions for all considered properties with a precision that is typically above 0.5. Moreover, we investigate a few different modeling choices and demonstrate how a classification approach can identify an equivalent amount of extraordinary compounds but with significantly fewer false positives than a regression approach. Finally, we discuss cautions and potential limitations in implementing such an approach to discover new record-breaking materials.

[1]  Steven K. Kauwe,et al.  Data-Driven Studies of Li-Ion-Battery Materials , 2019, Crystals.

[2]  Alok Choudhary,et al.  A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials , 2016 .

[3]  Felix A Faber,et al.  Crystal structure representations for machine learning models of formation energies , 2015, 1503.07406.

[4]  Jake Graser,et al.  Machine Learning and Energy Minimization Approaches for Crystal Structure Predictions: A Review and New Horizons , 2018 .

[5]  Felix A Faber,et al.  Machine Learning Energies of 2 Million Elpasolite (ABC_{2}D_{6}) Crystals. , 2015, Physical review letters.

[6]  Alok Choudhary,et al.  A predictive machine learning approach for microstructure optimization and materials design , 2015, Scientific Reports.

[7]  Ram Seshadri,et al.  Perspective: Interactive material property databases through aggregation of literature data , 2016 .

[8]  Jake Graser,et al.  Can machine learning find extraordinary materials , 2020 .

[9]  Yuma Iwasaki,et al.  Machine-learning guided discovery of a new thermoelectric material , 2019, Scientific Reports.

[10]  W. B. Pearson,et al.  Pearson's crystal data : crystal structure database for inorganic compounds , 2007 .

[11]  Cormac Toher,et al.  Universal fragment descriptors for predicting properties of inorganic crystals , 2016, Nature Communications.

[12]  S. Curtarolo,et al.  AFLOW: An automatic framework for high-throughput materials discovery , 2012, 1308.5715.

[13]  Chiho Kim,et al.  Machine learning in materials informatics: recent applications and prospects , 2017, npj Computational Materials.

[14]  Charles H. Ward Materials Genome Initiative for Global Competitiveness , 2012 .

[15]  Bryce Meredig,et al.  Materials Data Infrastructure: A Case Study of the Citrination Platform to Examine Data Import, Storage, and Access , 2016 .

[16]  G. Pilania,et al.  Machine learning bandgaps of double perovskites , 2016, Scientific Reports.

[17]  Shou-Cheng Zhang,et al.  Learning atoms for materials discovery , 2018, Proceedings of the National Academy of Sciences.

[18]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[19]  T. Pollock,et al.  3D printing of high-strength aluminium alloys , 2017, Nature.

[20]  Taylor D. Sparks,et al.  High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds , 2016 .

[21]  Olga Kononova,et al.  Unsupervised word embeddings capture latent knowledge from materials science literature , 2019, Nature.

[22]  I. Foster,et al.  The Materials Data Facility: Data Services to Advance Materials Science Research , 2016, JOM.

[23]  Paul Raccuglia,et al.  Machine-learning-assisted materials discovery using failed experiments , 2016, Nature.

[24]  J. Reymond,et al.  Exploring chemical space for drug discovery using the chemical universe database. , 2012, ACS chemical neuroscience.

[25]  Konrad Jacobs,et al.  Independent Identically Distributed (IID) Random Variables , 1992 .

[26]  Feng Lin,et al.  Machine Learning Directed Search for Ultraincompressible, Superhard Materials. , 2018, Journal of the American Chemical Society.

[27]  P. Kirkpatrick,et al.  Chemical space , 2004, Nature.

[28]  Jakoah Brgoch,et al.  Predicting the Band Gaps of Inorganic Solids by Machine Learning. , 2018, The journal of physical chemistry letters.

[29]  C. Babbage Passages from the Life of a Philosopher , 1968 .

[30]  Taylor D. Sparks,et al.  Perspective: Web-based machine learning models for real-time screening of thermoelectric materials properties , 2016 .

[31]  R. L. Mattis,et al.  The relationship between resistivity and dopant density for phosphorus-and boron-doped silicon , 1981 .

[32]  Anubhav Jain,et al.  Finding Nature′s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory. , 2010 .

[33]  H. K. D. H. Bhadeshia Neural Networks and Information in Materials Science , 2009 .

[34]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[35]  Jeffrey C Grossman,et al.  Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. , 2017, Physical review letters.

[36]  Steven K. Kauwe,et al.  Machine Learning Prediction of Heat Capacity for Solid Inorganics , 2018, Integrating Materials and Manufacturing Innovation.

[37]  Alok Choudhary,et al.  Combinatorial screening for new materials in unconstrained composition space with machine learning , 2014 .

[38]  Bryce Meredig,et al.  Data mining our way to the next generation of thermoelectrics , 2016 .

[39]  P. Luksch,et al.  New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design. , 2002, Acta crystallographica. Section B, Structural science.