MATCOR, a program for the cross-validation of material properties between databases

Abstract Data analytics approaches are increasingly often used to facilitate property-specific materials discovery. The uncertainties in these approaches can be greatly affected by the fidelity of the data sets that are used to train the data models. Therefore, data curation is an essential step for obtaining well-constrained model predictions. This can be a challenging task, especially for data sets that are too large for human quality control. We developed MATCOR, an open source, user-friendly, easily adaptable software to facilitate the data curation process. MATCOR processes lists of material identifiers in either AFLOW or Materials Project format and searches for the best matching materials entry in the other database. This is a non-trivial task due to differences in labeling and/or non-unique usage of material labels. MATCOR uses a combination of characteristics such space group, compound formula, crystal structure and use of Hubbard-U to provide the best possible comparison between databases. The capabilities of MATCOR are demonstrated for density, elastic properties, magnetic properties, and band gap correlations between AFLOW and Materials Project. We find that density shows the highest correlation among the tested properties, 93% of verified densities agree to within ±2%. Bulk- and shear-moduli showed deviations of less than ±10% for 80.6% and 65.1% of the materials, respectively. The classification of materials as non-magnetic/paramagnetic and metallic/gapped are consistent among the two databases for 91% and 69% of the materials, respectively. These examples show that MATCOR can be used to automate and thereby accelerate the data curation process prior to materials discovery through data analytical models.

[1]  C. Leyens,et al.  AEROSPACE AND SPACE MATERIALS , 2022 .

[2]  Saulius Gražulis,et al.  Crystallography Open Database – an open-access collection of crystal structures , 2009, Journal of applied crystallography.

[3]  T. Hahn International Tables for Crystallography: Space-group symmetry , 2006 .

[4]  Claudia Draxl,et al.  NOMAD: The FAIR concept for big data-driven materials science , 2018, MRS Bulletin.

[5]  R. Hill The Elastic Behaviour of a Crystalline Aggregate , 1952 .

[6]  H. Moriwake,et al.  Materials informatics for dielectric materials , 2018, Japanese Journal of Applied Physics.

[7]  D. Vanderbilt,et al.  First-principles investigation of ferroelectricity in perovskite compounds. , 1994, Physical review. B, Condensed matter.

[8]  Somnath Datta,et al.  Informatics-aided bandgap engineering for solar materials , 2014 .

[9]  Takao Matsubara,et al.  A novel hyperthermia treatment for bone metastases using magnetic materials , 2011, International Journal of Clinical Oncology.

[10]  F. Illas,et al.  An Empirical, yet Practical Way To Predict the Band Gap in Solids by Using Density Functional Band Structure Calculations , 2017 .

[11]  Anubhav Jain,et al.  Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis , 2012 .

[12]  G. Dalpian,et al.  Computational screening of bulk materials with intrinsic intermediate band , 2019, Computational Materials Science.

[13]  P. Fischer,et al.  Tailoring magnetic energies to form dipole skyrmions and skyrmion lattices , 2016, 1608.01368.

[14]  M. Klintenberg,et al.  Data mining and accelerated electronic structure theory as a tool in the search for new functional materials , 2008, 0808.2125.

[15]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[16]  Yu Zhang,et al.  Intrinsic peroxidase-like activity of ferromagnetic nanoparticles. , 2007, Nature nanotechnology.

[17]  Muratahan Aykol,et al.  The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies , 2015 .

[18]  Lu J. Sham,et al.  Density-functional theory of the band gap , 1985 .

[19]  Kieron Burke,et al.  Understanding band gaps of solids in generalized Kohn–Sham theory , 2016, Proceedings of the National Academy of Sciences.

[20]  I. D. Brown,et al.  The inorganic crystal structure data base , 1983, J. Chem. Inf. Comput. Sci..

[21]  S. Curtarolo,et al.  AFLOW: An automatic framework for high-throughput materials discovery , 2012, 1308.5715.

[22]  G. Kresse,et al.  Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set , 1996 .

[23]  Z. Ahmed,et al.  Awaruite, iridian awaruite, and a new Ru-Os-Ir-Ni-Fe alloy from the Sakhakot-Qila complex, Malakand Agency, Pakistan , 1981, Mineralogical Magazine.

[24]  S. Jana,et al.  Efficient band gap prediction of semiconductors and insulators from a semilocal exchange-correlation functional , 2019, Physical Review B.

[25]  C. Bárcena,et al.  APPLICATIONS OF MAGNETIC NANOPARTICLES IN BIOMEDICINE , 2003 .

[26]  Shashi K Murthy,et al.  Fundamentals and application of magnetic particles in cell isolation and enrichment: a review , 2015, Reports on progress in physics. Physical Society.

[27]  Marco Buongiorno Nardelli,et al.  A RESTful API for exchanging materials data in the AFLOWLIB.org consortium , 2014, 1403.2642.

[28]  Marco Buongiorno Nardelli,et al.  AFLUX: The LUX materials search API for the AFLOW data repositories , 2016, 1612.05130.

[29]  G. Kresse,et al.  From ultrasoft pseudopotentials to the projector augmented-wave method , 1999 .

[30]  H. Zhuang,et al.  Computational prediction and characterization of two-dimensional pentagonal arsenopyrite FeAsS , 2019, Computational Materials Science.

[31]  Kamal Choudhary,et al.  High-throughput Discovery of Topologically Non-trivial Materials using Spin-orbit Spillage , 2018, Scientific Reports.

[32]  G. Ceder,et al.  Efficient band gap prediction for solids. , 2010, Physical review letters.

[33]  P. Blaha,et al.  Accurate band gaps of semiconductors and insulators with a semilocal exchange-correlation potential. , 2009, Physical review letters.

[34]  N. Spaldin Multiferroics beyond electric-field control of magnetism , 2019, Proceedings of the Royal Society A.

[35]  William J. Joost,et al.  Reducing Vehicle Weight and Improving U.S. Energy Efficiency Using Integrated Computational Materials Engineering , 2012 .

[36]  Christian Elsässer,et al.  Compositional optimization of hard-magnetic phases with machine-learning models , 2018, Acta Materialia.

[37]  Feng Lin,et al.  Machine Learning Directed Search for Ultraincompressible, Superhard Materials. , 2018, Journal of the American Chemical Society.

[38]  Jakoah Brgoch,et al.  Predicting the Band Gaps of Inorganic Solids by Machine Learning. , 2018, The journal of physical chemistry letters.

[39]  W. C. Walker,et al.  Electronic spectrum and ultraviolet optical properties of crystalline MgO. , 1967 .

[40]  Corey Oses,et al.  Machine learning modeling of superconducting critical temperature , 2017, npj Computational Materials.

[41]  A. Fleszar LDA, GW, and exact-exchange Kohn-Sham scheme calculations of the electronic structure of sp semiconductors , 2001 .

[42]  Y. Tokura,et al.  Transformation between meron and skyrmion topological spin textures in a chiral magnet , 2018, Nature.

[43]  Anubhav Jain,et al.  The Materials Application Programming Interface (API): A simple, flexible and efficient API for materials data based on REpresentational State Transfer (REST) principles , 2015 .

[44]  Kyle Chard,et al.  Matminer: An open source toolkit for materials data mining , 2018, Computational Materials Science.