Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery

Traditional machine learning (ML) metrics overestimate model performance for materials discovery. We introduce (1) leave-one-cluster-out cross-validation (LOCO CV) and (2) a simple nearest-neighbor benchmark to show that model performance in discovery applications strongly depends on the problem, data sampling, and extrapolation. Our results suggest that ML-guided iterative experimentation may outperform standard high-throughput screening for discovering breakthrough materials like high-Tc superconductors with ML.

[1]  C. Breneman,et al.  Prediction of polymer properties using infinite chain descriptors (ICD) and machine learning: Toward optimized dielectric polymeric materials , 2016 .

[2]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[3]  Julia Ling,et al.  High-Dimensional Materials and Process Optimization Using Data-Driven Experimental Design with Well-Calibrated Uncertainty Estimates , 2017, Integrating Materials and Manufacturing Innovation.

[4]  Koji Tsuda,et al.  COMBO: An efficient Bayesian optimization library for materials science , 2016 .

[5]  Alok Choudhary,et al.  A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials , 2016 .

[6]  Surya R. Kalidindi,et al.  Extracting knowledge from molecular mechanics simulations of grain boundaries using machine learning , 2017 .

[7]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[8]  Anton O Oliynyk,et al.  Discovery of Intermetallic Compounds from Traditional to Machine-Learning Approaches. , 2018, Accounts of chemical research.

[9]  Alok Choudhary,et al.  Combinatorial screening for new materials in unconstrained composition space with machine learning , 2014 .

[10]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[11]  Engineering,et al.  Prediction model of band gap for inorganic compounds by combination of density functional theory calculations and machine learning techniques , 2016 .

[12]  Christopher M Wolverton,et al.  Atomistic calculations and materials informatics: A review , 2017 .

[13]  Parijat Deshpande,et al.  Exploration of data science techniques to predict fatigue strength of steel from composition and processing parameters , 2014, Integrating Materials and Manufacturing Innovation.

[14]  Heather J Kulik,et al.  Accelerating Chemical Discovery with Machine Learning: Simulated Evolution of Spin Crossover Complexes with an Artificial Neural Network. , 2018, The journal of physical chemistry letters.

[15]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[16]  B. Meredig,et al.  Materials science with large-scale data and informatics: Unlocking new opportunities , 2016 .

[17]  T. Pollock,et al.  3D printing of high-strength aluminium alloys , 2017, Nature.

[18]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[19]  Yousef Saad,et al.  Formation enthalpies for transition metal alloys using machine learning , 2017 .

[20]  Adrian E. Roitberg,et al.  Less is more: sampling chemical space with active learning , 2018, The Journal of chemical physics.

[21]  Manh Cuong Nguyen,et al.  On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets , 2014, Scientific Reports.

[22]  Carsten F. Dormann,et al.  Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure , 2017 .

[23]  James Theiler,et al.  Accelerated search for materials with targeted properties by adaptive design , 2016, Nature Communications.

[24]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[25]  Taylor D. Sparks,et al.  High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds , 2016 .

[26]  Stefano Curtarolo,et al.  How the Chemical Composition Alone Can Predict Vibrational Free Energies and Entropies of Solids , 2017, 1703.02309.

[27]  Christopher Wolverton,et al.  Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments , 2018, Science Advances.

[28]  B. D. Conduit,et al.  Design of a nickel-base superalloy using a neural network , 2017, ArXiv.