Universal fragment descriptors for predicting properties of inorganic crystals

Although historically materials discovery has been driven by a laborious trial-and-error process, knowledge-driven materials design can now be enabled by the rational combination of Machine Learning methods and materials databases. Here, data from the AFLOW repository for ab initio calculations is combined with Quantitative Materials Structure-Property Relationship models to predict important properties: metal/insulator classification, band gap energy, bulk/shear moduli, Debye temperature and heat capacities. The prediction's accuracy compares well with the quality of the training data for virtually any stoichiometric inorganic crystalline material, reciprocating the available thermomechanical experimental data. The universality of the approach is attributed to the construction of the descriptors: Property-Labelled Materials Fragments. The representations require only minimal structural input allowing straightforward implementations of simple heuristic design rules.

[1]  A. Minnich,et al.  Phonon heat conduction in layered anisotropic crystals , 2014, 1409.5364.

[2]  I. B. Kobiakov Elastic, piezoelectric and dielectric properties of ZnO and CdS single crystals in a wide range of temperatures , 1980 .

[3]  Alexandre Varnek,et al.  Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures , 2005, J. Comput. Aided Mol. Des..

[4]  Robert G. Bell,et al.  Isomorphism of Anhydrous Tetrahedral Halides and Silicon Chalcogenides: Energy Landscape of Crystalline BeF2, BeCl2, SiO2, and SiS2 , 2008 .

[5]  Carmelo Sunseri,et al.  Semiempirical Correlation between Optical Band Gap Values of Oxides and the Difference of Electronegativity of the Elements. Its Importance for a Quantitative Use of Photocurrent Spectroscopy in Corrosion Studies , 1997 .

[6]  Rafal Kruszynski,et al.  Redetermination of hydrogenhydrazinium dichloride , 2007 .

[7]  水谷 宇一郎,et al.  Hume-Rothery rules for structurally complex alloy phases , 2011 .

[8]  Davide M Proserpio,et al.  Entangled two-dimensional coordination networks: a general survey. , 2014, Chemical reviews.

[9]  J. Bajorath,et al.  Docking and scoring in virtual screening for drug discovery: methods and applications , 2004, Nature Reviews Drug Discovery.

[10]  V. Blatov Voronoi–dirichlet polyhedra in crystal chemistry: theory and applications , 2004 .

[11]  John A. Duffy Variable electronegativity of oxygen in binary oxides: Possible relevance to molten fluorides , 1977 .

[12]  W. Lipscomb,et al.  The crystal structures of hydrogen cyanide, HCN , 1951 .

[13]  Krishna Rajan,et al.  Combinatorial design of semiconductor chemistry for bandgap engineering: “virtual” combinatorial experimentation , 2004 .

[14]  Víctor Luaña,et al.  GIBBS: isothermal-isobaric thermodynamics of solids from energy curves using a quasi-harmonic Debye model☆ , 2004 .

[15]  V Kishore Ayyadevara,et al.  Gradient Boosting Machine , 2018 .

[16]  Alexandre Varnek,et al.  Chemoinformatics approaches to virtual screening , 2008 .

[17]  I. Takeuchi,et al.  Role of high-throughput characterization tools in combinatorial materials science , 2004 .

[18]  Fei Yuan,et al.  Chemical Descriptors Are More Important Than Learning Algorithms for Modelling , 2012, Molecular informatics.

[19]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[20]  S. Curtarolo,et al.  Nanograined Half‐Heusler Semiconductors as Advanced Thermoelectrics: An Ab Initio High‐Throughput Statistical Study , 2014, 1408.5859.

[21]  Xiangqian Hu,et al.  Improving band gap prediction in density functional theory from molecules to solids. , 2011, Physical review letters.

[22]  Ingvar Lindgren Relativistic Many-Body Theory: A New Field-Theoretical Approach , 2011 .

[23]  G. Kresse,et al.  From ultrasoft pseudopotentials to the projector augmented-wave method , 1999 .

[24]  Yasemin Oztekin Ciftci,et al.  Structural, elastic, electronic, and thermodynamic properties of PrN from first principles calculations , 2010 .

[25]  Eugene N Muratov,et al.  Universal Approach for Structural Interpretation of QSAR/QSPR Models , 2013, Molecular informatics.

[26]  Lemont B. Kier,et al.  The Electrotopological State: An Atom Index for QSAR , 1991 .

[27]  Jesús Vicente de Julián-Ortiz,et al.  Topological Approach to Drug Design , 1995, J. Chem. Inf. Comput. Sci..

[28]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[29]  Stefano Curtarolo,et al.  High-throughput electronic band structure calculations: Challenges and tools , 2010, 1004.2974.

[30]  Marco Buongiorno Nardelli,et al.  High-throughput computational screening of thermal conductivity, Debye temperature, and Grüneisen parameter using a quasiharmonic Debye model , 2014, 1407.7789.

[31]  Subhash Shinde,et al.  High Thermal Conductivity Materials , 2001 .

[32]  Alexandre Varnek,et al.  Building a chemical space based on fragment descriptors. , 2008, Combinatorial chemistry & high throughput screening.

[33]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[34]  Marco Buongiorno Nardelli,et al.  The high-throughput highway to computational materials design. , 2013, Nature materials.

[35]  G. Pilania,et al.  Machine learning bandgaps of double perovskites , 2016, Scientific Reports.

[36]  Edward O. Pyzer-Knapp,et al.  Learning from the Harvard Clean Energy Project: The Use of Neural Networks to Accelerate Materials Discovery , 2015 .

[37]  Marco Buongiorno Nardelli,et al.  The AFLOW standard for high-throughput materials science calculations , 2015, 1506.00303.

[38]  W H White,et al.  General Survey. , 1913, Proceedings of the Royal Society of Medicine.

[39]  Alán Aspuru-Guzik,et al.  The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid , 2011 .

[40]  Miguel A. L. Marques,et al.  Pressure effects on the structure and vibrations of - and ?-C3N4 , 2004 .

[41]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[42]  Robert van Leeuwen,et al.  Levels of self-consistency in the GW approximation. , 2009, The Journal of chemical physics.

[43]  Zhijian Wu,et al.  A novel low compressible and superhard carbon nitride: body-centered tetragonal CN2. , 2012, Physical chemistry chemical physics : PCCP.

[44]  Wei-Yin Loh,et al.  Fifty Years of Classification and Regression Trees , 2014 .

[45]  A. P. Shevchenko,et al.  Crystal space analysis by means of Voronoi–Dirichlet polyhedra , 1995 .

[46]  Wencong Lu,et al.  Using support vector regression for the prediction of the band gap and melting point of binary and ternary compound semiconductors , 2006 .

[47]  Sudhanshu S. Jha,et al.  Pairing mechanisms and anisotropic superconductivity in layered crystals , 1989 .

[48]  Frank H. Allen,et al.  Cambridge Structural Database , 2002 .

[49]  Corey Oses,et al.  High-Throughput Computation of Thermal Conductivity of High-Temperature Solid Phases: The Case of Oxide and Fluoride Perovskites , 2016, 1606.03279.

[50]  Stefano Curtarolo,et al.  Uncovering compounds by synergy of cluster expansion and high-throughput methods. , 2010, Journal of the American Chemical Society.

[51]  Cormac Toher,et al.  Charting the complete elastic properties of inorganic crystalline compounds , 2015, Scientific Data.

[52]  L. Hall,et al.  Molecular Structure Description: The Electrotopological State , 1999 .

[53]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[54]  Vladislav A. Blatov,et al.  A Possible Route toward Expert Systems in Supramolecular Chemistry: 2-Periodic H-Bond Patterns in Molecular Crystals , 2014 .

[55]  William N. Lipscomb,et al.  The Crystal Structure of Hydrazinium Dichloride, N2H6Cl2 , 1947 .

[56]  P. Luksch,et al.  New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design. , 2002, Acta crystallographica. Section B, Structural science.

[57]  Ingvar Lindgren The Bethe–Salpeter Equation , 2011 .

[58]  Dulal C. Ghosh,et al.  Theoretical Calculation of Absolute Radii of Atoms and Ions. Part 2. The Ionic Radii , 2002 .

[59]  Uichiro Mizutani,et al.  The Hume-Rothery Rules for Structurally Complex Alloy Phases , 2010 .

[60]  Steven G. Louie,et al.  Quasiparticle effects in the bulk and surface-state bands of Bi$_{2}$Se$_{3}$ and Bi$_{2}$Te$_{3}$ topological insulators , 2011, 1108.2088.

[61]  I. D. Brown,et al.  The inorganic crystal structure data base , 1983, J. Chem. Inf. Comput. Sci..

[62]  Yanchun Zhou,et al.  Al5BO9: A Wide Band Gap, Damage-Tolerant, and Thermal Insulating Lightweight Material for High-Temperature Applications , 2016 .

[63]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[64]  H. Koinuma,et al.  Combinatorial solid-state chemistry of inorganic materials , 2004, Nature materials.

[65]  Burke,et al.  Generalized Gradient Approximation Made Simple. , 1996, Physical review letters.

[66]  Wei Chen,et al.  A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of k-nary Inorganic Polycrystalline Compounds , 2016, Scientific Reports.

[67]  Aron Walsh,et al.  Inorganic materials: The quest for new functionality. , 2015, Nature chemistry.

[68]  A. Gorse Diversity in medicinal chemistry space. , 2006, Current topics in medicinal chemistry.

[69]  Blöchl,et al.  Projector augmented-wave method. , 1994, Physical review. B, Condensed matter.

[70]  Takao Kotani,et al.  Quasiparticle self-consistent GW theory. , 2006, Physical review letters.

[71]  C. Castleton,et al.  Managing the supercell approximation for charged defects in semiconductors: Finite-size scaling, charge correction factors, the band-gap problem, and the ab initio dielectric constant , 2006 .

[72]  Muratahan Aykol,et al.  Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD) , 2013 .

[73]  P. Bahr,et al.  Sampling: Theory and Applications , 2020, Applied and Numerical Harmonic Analysis.

[74]  Ankit Agrawal,et al.  Predictive analytics for crystalline materials: bulk modulus , 2016 .

[75]  Martinez,et al.  Analytic relation between bulk moduli and lattice constants. , 1987, Physical review. B, Condensed matter.

[76]  Jan Klein,et al.  Superconductivity in high Debye temperature material , 1980 .

[77]  Robert G. Bell,et al.  Isomorphism of anhydrous tetrahedral halides and silicon chalcogenides: energy landscape of crystalline BeF2, BeCl2, SiO2, and SiS2. , 2008, Journal of the American Chemical Society.

[78]  Gus L. W. Hart,et al.  Subject Areas : Materials Science A Viewpoint on : Comprehensive Search for New Phases and Compounds in Binary Alloy Systems Based on Platinum-Group Metals , Using a Computational First-Principles Approach , 2013 .

[79]  Corey Oses,et al.  Materials Cartography: Representing and Mining Material Space Using Structural and Electronic Fingerprints , 2014, 1412.4096.

[80]  R. J. Nelmes,et al.  Crystal Structure of the High Pressure Phase of Bismuth Bi-III , 2001 .

[81]  Krishna Rajan,et al.  Materials Informatics: The Materials ``Gene'' and Big Data , 2015 .

[82]  Marco Buongiorno Nardelli,et al.  AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations , 2012 .

[83]  Detlef Hommel,et al.  Temperature dependence of the thermal expansion of AlN , 2009 .

[84]  Mahito Kohmoto,et al.  Anisotropic superconductivity mediated by phonons in layered compounds with weak screening effects , 2002 .

[85]  Gus L. W. Hart,et al.  Ordered phases in ruthenium binary alloys from high-throughput first-principles calculations , 2011 .

[86]  Xin-Quan Wang,et al.  Hydrogenated K4 carbon: a new stable cubic gauche structure of carbon hydride. , 2013, The Journal of chemical physics.

[87]  J. Vybíral,et al.  Big data of materials science: critical role of the descriptor. , 2014, Physical review letters.

[88]  Beatriz Cordero,et al.  Covalent radii revisited. , 2008, Dalton transactions.

[89]  V. Blatov,et al.  Interpenetrating metal-organic and inorganic 3D networks: a computer-aided systematic investigation. Part II [1]. Analysis of the Inorganic Crystal Structure Database (ICSD) , 2005 .

[90]  R. Parr,et al.  Absolute hardness: companion parameter to absolute electronegativity , 1983 .

[91]  G. Elizabeth Escorcia-Salas,et al.  Influence of Zr concentration on crystalline structure and its electronic properties in the new ZrxAl1-xN compound in wurtzite phase: An ab initio study , 2008, Microelectron. J..

[92]  K. Doll,et al.  Structure prediction based on ab initio simulated annealing for boron nitride , 2008, 0810.5476.

[93]  Linus Pauling,et al.  The Nature of the Chemical Bond and the Structure of Molecules and Crystals , 1941, Nature.

[94]  John P. Perdew,et al.  Density functional theory and the band gap problem , 1986 .

[95]  Uppsala University,et al.  Managing the supercell approximation for charged defects in semiconductors: finite size scaling, charge correction factors, the bandgap problem and the ab initio dielectric constant , 2005 .

[96]  K. Pitzer,et al.  The Nature of the Chemical Bond and the Structure of Molecules and Crystals: An Introduction to Modern Structural Chemistry. , 1960 .

[97]  Soo Jin Chua,et al.  On the Prediction of Ternary Semiconductor Properties by Artificial Intelligence Methods , 2002 .

[98]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[99]  Marco Buongiorno Nardelli,et al.  Combining the AFLOW GIBBS and elastic libraries to efficiently and robustly screen thermomechanical properties of solids , 2016, 1611.05714.

[100]  D. Horvath,et al.  ISIDA Property‐Labelled Fragment Descriptors , 2010, Molecular informatics.

[101]  S. Curtarolo,et al.  AFLOW: An automatic framework for high-throughput materials discovery , 2012, 1308.5715.