Random forest machine learning models for interpretable X-ray absorption near-edge structure spectrum-property relationships

X-ray absorption spectroscopy (XAS) produces a wealth of information about the local structure of materials, but interpretation of spectra often relies on easily accessible trends and prior assumptions about the structure. Recently, researchers have demonstrated that machine learning models can automate this process to predict the coordinating environments of absorbing atoms from their XAS spectra. However, machine learning models are often difficult to interpret, making it challenging to determine when they are valid and whether they are consistent with physical theories. In this work, we present three main advances to the data-driven analysis of XAS spectra: we demonstrate the efficacy of random forests in solving two new property determination tasks (predicting Bader charge and mean nearest neighbor distance), we address how choices in data representation affect model interpretability and accuracy, and we show that multiscale featurization can elucidate the regions and trends in spectra that encode various local properties. The multiscale featurization transforms the spectrum into a vector of polynomial-fit features, and is contrasted with the commonly-used “pointwise” featurization that directly uses the entire spectrum as input. We find that across thousands of transition metal oxide spectra, the relative importance of features describing the curvature of the spectrum can be localized to individual energy ranges, and we can separate the importance of constant, linear, quadratic, and cubic trends, as well as the white line energy. This work has the potential to assist rigorous theoretical interpretations, expedite experimental data collection, and automate analysis of XAS spectra, thus accelerating the discovery of new functional materials.

[1]  M. Abuin,et al.  XAS study of Mn, Fe and Cu as indicators of historical glass decay , 2013 .

[2]  N. Dasgupta,et al.  Elucidating the Evolving Atomic Structure in Atomic Layer Deposition Reactions with in Situ XANES and Machine Learning , 2019, Chemistry of Materials.

[3]  J. Rehr,et al.  Coordination chemistry of Ti(IV) in silicate glasses and melts: II. Glasses at ambient temperature and pressure , 1996 .

[4]  Efthimios Kaxiras,et al.  Machine Learning Prediction of H Adsorption Energies on Ag Alloys , 2019, J. Chem. Inf. Model..

[5]  J. Gregoire,et al.  Analyzing machine learning models to accelerate generation of fundamental materials insights , 2019, npj Computational Materials.

[6]  J. Gregoire,et al.  Functional mapping reveals mechanistic clusters for OER catalysis across (Cu–Mn–Ta–Co–Sn–Fe)Ox composition and pH space , 2019, Materials Horizons.

[7]  Alán Aspuru-Guzik,et al.  Next-Generation Experimentation with Self-Driving Laboratories , 2019, Trends in Chemistry.

[8]  Alán Aspuru-Guzik,et al.  ChemOS: An orchestration software to democratize autonomous discovery , 2020, PloS one.

[9]  Alexander Guda,et al.  PyFitit: The software for quantitative analysis of XANES spectra using machine-learning algorithms , 2020, Comput. Phys. Commun..

[10]  Slobodan Mitrovic,et al.  Discovering Ce-rich oxygen evolution catalysts, from high throughput screening to water electrolysis , 2014 .

[11]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[12]  P. Eisenberger,et al.  Extended x-ray absorption fine structure—its strengths and limitations as a structural tool , 1981 .

[13]  P. Petit,et al.  Transition elements in water-bearing silicate glasses/melts. part I. a high-resolution and anharmonic analysis of Ni coordination environments in crystals, glasses, and melts , 2001 .

[14]  V. Kunzl A linear dependence of energy levels on the valency of elements , 1932 .

[15]  Sorelle A. Friedler,et al.  Experiment Specification, Capture and Laboratory Automation Technology (ESCALATE): a software pipeline for automated chemical experimentation and data management , 2019, MRS Communications.

[16]  Paul F. Ndione,et al.  Design of Semiconducting Tetrahedral Mn 1 − x Zn x O Alloys and Their Application to Solar Water Splitting , 2015 .

[17]  C. Natoli,et al.  The MXAN procedure: a new method for analysing the XANES spectra of metalloproteins to obtain structural quantitative information. , 2003, Journal of synchrotron radiation.

[18]  C. Garner,et al.  X-ray absorption spectroscopy , 1979, Nature.

[19]  Anubhav Jain,et al.  Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis , 2012 .

[20]  F. Cotton,et al.  Soft X‐Ray Absorption Edges of Metal Ions in Complexes. III. Zinc (II) Complexes , 1958 .

[21]  Claudia Draxl,et al.  exciting: a full-potential all-electron package implementing density-functional theory and many-body perturbation theory , 2014, Journal of physics. Condensed matter : an Institute of Physics journal.

[22]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Takashi Yamamoto Assignment of pre‐edge peaks in K‐edge x‐ray absorption spectra of 3d transition metal compounds: electric dipole or quadrupole? , 2008 .

[24]  B. Weckhuysen Chemical imaging of spatial heterogeneities in catalytic solids at different length and time scales. , 2009, Angewandte Chemie.

[25]  M. Benfatto,et al.  MXAN: a new software procedure to perform geometrical fitting of experimental XANES spectra. , 2001, Journal of synchrotron radiation.

[26]  Alán Aspuru-Guzik,et al.  Accelerating the discovery of materials for clean energy in the era of smart automation , 2018, Nature Reviews Materials.

[27]  M. Benfatto,et al.  Geometrical fitting of experimental XANES spectra by a full multiple-scattering procedure. , 2001, Journal of synchrotron radiation.

[28]  Chad A. Mirkin,et al.  Catalyst discovery through megalibraries of nanomaterials , 2018, Proceedings of the National Academy of Sciences.

[29]  W. W. Beeman,et al.  The Mn K Absorption Edge in Manganese Metal and Manganese Compounds , 1949 .

[30]  Manh Cuong Nguyen,et al.  On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets , 2014, Scientific Reports.

[31]  Muratahan Aykol,et al.  The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies , 2015 .

[32]  Edward A. Stern,et al.  New Technique for Investigating Noncrystalline Structures: Fourier Analysis of the Extended X-Ray—Absorption Fine Structure , 1971 .

[33]  Khadine A. Higgins,et al.  The Electronic Structure of the Metal Active Site Determines the Geometric Structure and Function of the Metalloregulator NikR. , 2019, Biochemistry.

[34]  David M. Cwiertny,et al.  α-Fe2O3 Nanoparticles as Oxygen Carriers for Chemical Looping Combustion: An Integrated Materials Characterization Approach to Understanding Oxygen Carrier Performance, Reduction Mechanism, and Particle Size Effects , 2018, Energy & Fuels.

[35]  J. Rehr,et al.  Theoretical approaches to x-ray absorption fine structure , 2000 .

[36]  W. Chueh,et al.  Redox activity of surface oxygen anions in oxygen-deficient perovskite oxides during electrochemical reactions , 2015, Nature Communications.

[37]  Chi Chen,et al.  Random Forest Models for Accurate Identification of Coordination Environments from X-Ray Absorption Near-Edge Structure , 2019, Patterns.

[38]  P. Petit,et al.  Oxidation state and coordination of Fe in minerals: An Fe K-XANES spectroscopic study , 2001 .

[39]  Ping Liu,et al.  Mapping XANES spectra on structural descriptors of copper oxide clusters using supervised machine learning. , 2019, The Journal of chemical physics.

[40]  J. Rehr,et al.  Parameter-free calculations of X-ray spectra with FEFF9. , 2010, Physical chemistry chemical physics : PCCP.

[41]  Joanna Aizenberg,et al.  Probing Atomic Distributions in Mono- and Bimetallic Nanoparticles by Supervised Machine Learning. , 2018, Nano letters.

[42]  Yu-Lin Kuo,et al.  A facile method for sodium-modified Fe2O3/Al2O3 oxygen carrier by an air atmospheric pressure plasma jet for chemical looping combustion process , 2017 .

[43]  Eric L. Shirley,et al.  Efficient implementation of core-excitation Bethe-Salpeter equation calculations , 2015, Comput. Phys. Commun..

[44]  D. Lu,et al.  Supervised Machine-Learning-Based Determination of Three-Dimensional Structure of Metallic Nanoparticles. , 2017, The journal of physical chemistry letters.

[45]  Fernando D. Vila,et al.  Theoretical X-Ray Absorption Debye-Waller Factors , 2007, cond-mat/0702397.

[46]  D. Sokaras,et al.  Revealing Electronic Signature of Lattice Oxygen Redox in Lithium Ruthenates and Implications for High-Energy Li-ion Battery Material Designs. , 2019, Chemistry of materials : a publication of the American Chemical Society.

[47]  Claudia Draxl,et al.  Bethe–Salpeter equation for absorption and scattering spectroscopy: implementation in the exciting code , 2019, Electronic Structure.

[48]  Hanmei Tang,et al.  Automated generation and ensemble-learned matching of X-ray absorption spectra , 2017, npj Computational Materials.

[49]  Juris Purans,et al.  Neural Network Approach for Characterizing Structural Transformations by X-Ray Absorption Fine Structure Spectroscopy. , 2018, Physical review letters.

[50]  Yang Shao-Horn,et al.  Double perovskites as a family of highly active catalysts for oxygen evolution in alkaline solution , 2013, Nature Communications.

[51]  Ronan Le Bras,et al.  Automated Phase Mapping with AgileFD and its Application to Light Absorber Discovery in the V-Mn-Nb Oxide System. , 2016, ACS combinatorial science.

[52]  Joel Nothman,et al.  Author Correction: SciPy 1.0: fundamental algorithms for scientific computing in Python , 2020, Nature Methods.

[53]  A. Beale,et al.  The role of synchrotron radiation in examining the self-assembly of crystalline nanoporous framework materials: from zeolites and aluminophosphates to metal organic hybrids. , 2010, Chemical Society reviews.

[54]  Chi Chen,et al.  High-throughput computational X-ray absorption spectroscopy , 2018, Scientific Data.

[55]  Shinjae Yoo,et al.  Classification of local chemical environments from x-ray absorption spectra using supervised machine learning , 2019, Physical Review Materials.

[56]  G. Reid,et al.  Cr K‐Edge XANES Spectroscopy: Ligand and Oxidation State Dependence — What is Oxidation State? , 2007 .

[57]  Z. Németh,et al.  Laboratory von Hámos X-ray spectroscopy for routine sample characterization. , 2016, The Review of scientific instruments.

[58]  J J Kas,et al.  Bethe-Salpeter equation calculations of core excitation spectra. , 2010, Physical review. B, Condensed matter and materials physics.

[59]  A. P. Sorini,et al.  Ab initio theory and calculations of X-ray spectra , 2009 .

[60]  Alexei Kuzmin,et al.  EXAFS and XANES analysis of oxides at the nanoscale , 2014, IUCrJ.

[61]  R. V. Dover,et al.  CRYSTAL: a multi-agent AI system for automated mapping of materials’ crystal structures , 2019, MRS Communications.

[62]  K. Hodgson,et al.  Sulfur K-edge X-ray absorption spectroscopy as a probe of ligand-metal bond covalency: metal vs ligand oxidation in copper and nickel dithiolene complexes. , 2007, Journal of the American Chemical Society.

[63]  Anubhav Jain,et al.  propnet: A Knowledge Graph for Materials Science , 2020, Matter.

[64]  F. Farges Ab initio and experimental pre-edge investigations of the Mn K-edge XANES in oxide-type materials , 2005 .

[65]  Patricia Ann Mabrouk,et al.  Multi-spectroscopic study of Fe(II) in silicate glasses: Implications for the coordination environment of Fe(II) in silicate melts , 2005 .

[66]  A. Beale,et al.  Chemical imaging of catalytic solids with synchrotron radiation. , 2010, Chemical Society reviews.

[67]  Bert M. Weckhuysen,et al.  Spatial and temporal exploration of heterogeneous catalysts with synchrotron radiation , 2018, Nature Reviews Materials.

[68]  Nathan C Frey,et al.  Prediction of Synthesis of 2D Metal Carbides and Nitrides (MXenes) and Their Precursors with Positive and Unlabeled Machine Learning. , 2019, ACS nano.

[69]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[70]  B. Kincaid,et al.  EXAFS: new horizons in structure determinations. , 1978, Science.

[71]  Kyle Chard,et al.  Matminer: An open source toolkit for materials data mining , 2018, Computational Materials Science.

[72]  Matthew Horton,et al.  Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows , 2017 .

[73]  Jonathan Vandermause,et al.  On-the-fly active learning of interpretable Bayesian force fields for atomistic rare events , 2020, npj Computational Materials.

[74]  E. Borfecchia,et al.  Quantitative structural determination of active sites from in situ and operando XANES spectra: From standard ab initio simulations to chemometric and machine learning approaches , 2019, Catalysis Today.

[75]  M. Wilke,et al.  The oxidation state of iron determined by Fe K-edge XANES—application to iron gall ink in historical manuscripts , 2009 .

[76]  D. H. Maylotte,et al.  A Study of the K-edge Absorption Spectra of Selected Vanadium Compounds. , 1984 .

[77]  R. Bartlett Adventures in DFT by a wavefunction theorist. , 2019, The Journal of chemical physics.

[78]  G. Henkelman,et al.  A fast and robust algorithm for Bader decomposition of charge density , 2006 .

[79]  John J. Rehr,et al.  Sensitivity of Pt x-ray absorption near edge structure to the morphology of small Pt clusters , 2002 .

[80]  Claudia Draxl,et al.  Addressing electron-hole correlation in core excitations of solids: An all-electron many-body approach from first principles , 2016, 1612.02597.

[81]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[82]  J. Rehr,et al.  TI K-EDGE XANES STUDIES OF TI COORDINATION AND DISORDER IN OXIDE COMPOUNDS: COMPARISON BETWEEN THEORY AND EXPERIMENT , 1997 .

[83]  J. Anderson,et al.  Chapter V – Comparison between Theory and Experiment , 1976 .

[84]  A. Frenkel,et al.  “Inverting” X-ray Absorption Spectra of Catalysts by Machine Learning in Search for Activity Descriptors , 2019, ACS Catalysis.