Machine Learning and Statistical Analysis for Materials Science: Stability and Transferability of Fingerprint Descriptors and Chemical Insights

In the paradigm of virtual high-throughput screening for materials, we have developed a semiautomated workflow or “recipe” that can help a material scientist to start from a raw data set of materials with their properties and descriptors, build predictive models, and draw insights into the governing mechanism. We demonstrate our recipe, which employs machine learning tools and statistical analysis, through application to a case study leading to identification of descriptors relevant to catalysts for CO2 electroreduction, starting from a published database of 298 catalyst alloys. At the heart of our methodology lies the Bootstrapped Projected Gradient Descent (BoPGD) algorithm, which has significant advantages over commonly used machine learning (ML) and statistical analysis (SA) tools such as the regression coefficient shrinkage-based method (LASSO) or artificial neural networks: (a) it selects descriptors with greater stability and transferability, with a goal to understand the chemical mechanism rather ...

[1]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[2]  Indranil Bhattacharya Feature Selection under Multicollinearity & Causal Inference on Time Series , 2017 .

[3]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[4]  Xianfeng Ma,et al.  Orbitalwise Coordination Number for Predicting Adsorption Properties of Metal Nanocatalysts. , 2017, Physical review letters.

[5]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[6]  Thomas Bligaard,et al.  Density functional theory in surface chemistry and catalysis , 2011, Proceedings of the National Academy of Sciences.

[7]  H. Xin,et al.  Predictive Structure–Reactivity Models for Rapid Screening of Pt-Based Multimetallic Electrocatalysts for the Oxygen Reduction Reaction , 2012 .

[8]  J. Vybíral,et al.  Big data of materials science: critical role of the descriptor. , 2014, Physical review letters.

[9]  J. G. Chen,et al.  Role of strain and ligand effects in the modification of the electronic and chemical properties of bimetallic surfaces. , 2004, Physical review letters.

[10]  Toshio Tsukamoto,et al.  Electrocatalytic process of CO selectivity in electrochemical reduction of CO2 at metal electrodes in aqueous media , 1994 .

[11]  S. Geer,et al.  Correlated variables in regression: Clustering and sparse estimation , 2012, 1209.5908.

[12]  A. Vojvodić,et al.  Effects of d-band shape on the surface reactivity of transition-metal alloys , 2014 .

[13]  Jiujun Zhang,et al.  A review of catalysts for the electroreduction of carbon dioxide to produce low-carbon fuels. , 2014, Chemical Society reviews.

[14]  J. Topliss,et al.  Chance factors in studies of quantitative structure-activity relationships. , 1979, Journal of medicinal chemistry.

[15]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[16]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[17]  Han van de Waterbeemd,et al.  Chemometric Methods in Molecular Design: van de Waterbeemd/Chemometric , 1995 .

[18]  J. Kitchin,et al.  Effects of strain, d-band filling, and oxidation state on the bulk electronic structure of cubic 3d perovskites. , 2011, The Journal of chemical physics.

[19]  Desire L. Massart,et al.  Random correlation in variable selection for multivariate calibration with a genetic algorithm , 1996 .

[20]  Paul J. A. Kenis,et al.  Electrochemical conversion of CO2 to useful chemicals: current status, remaining challenges, and future opportunities , 2013 .

[21]  Zheng Li,et al.  Feature engineering of machine-learning chemisorption models for catalyst design , 2017 .

[22]  Sanguthevar Rajasekaran,et al.  Accelerating materials property predictions using machine learning , 2013, Scientific Reports.

[23]  H. K. D. H. Bhadeshia,et al.  Neural Networks in Materials Science , 1999 .

[24]  Christopher M Wolverton,et al.  Dissolving the Periodic Table in Cubic Zirconia: Data Mining to Discover Chemical Trends , 2014 .

[25]  Scheffler,et al.  Local Chemical Reactivity of a Metal Alloy Surface. , 1995, Physical review letters.

[26]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[27]  Jean-Philippe Vert,et al.  The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures , 2011, PloS one.

[28]  J. Nørskov,et al.  Effect of Strain on the Reactivity of Metal Surfaces , 1998 .

[29]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[30]  Nicola Nosengo,et al.  Can artificial intelligence create the next wonder material? , 2016, Nature.

[31]  Luke E K Achenie,et al.  Machine-Learning-Augmented Chemisorption Model for CO2 Electroreduction Catalyst Screening. , 2015, The journal of physical chemistry letters.

[32]  Morikawa,et al.  CO chemisorption at metal surfaces and overlayers. , 1996, Physical review letters.

[33]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .