Uncovering structure-property relationships of materials by subgroup discovery

Subgroup discovery (SGD) is presented here as a data-mining approach to help find interpretable local patterns, correlations, and descriptors of a target property in materials-science data. Specifically, we will be concerned with data generated by density-functional theory calculations. At first, we demonstrate that SGD can identify physically meaningful models that classify the crystal structures of 82 octet binary semiconductors as either rocksalt or zincblende. SGD identifies an interpretable two-dimensional model derived from only the atomic radii of valence s and p orbitals that properly classifies the crystal structures for 79 of the 82 octet binary semiconductors. The SGD framework is subsequently applied to 24 400 configurations of neutral gas-phase gold clusters with 5 to 14 atoms to discern general patterns between geometrical and physicochemical properties. For example, SGD helps find that van der Waals interactions within gold clusters are linearly correlated with their radius of gyration and are weaker for planar clusters than for nonplanar clusters. Also, a descriptor that predicts a local linear correlation between the chemical hardness and the cluster isomer stability is found for the even-sized gold clusters.

[1]  Ramos,et al.  Calculated electronic structure of Au13 clusters. , 1989, Physical review. B, Condensed matter.

[2]  M. Scheffler,et al.  Free gold clusters: beyond the static, monostructure description. , 2011, Faraday discussions.

[3]  Alok Choudhary,et al.  A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials , 2016 .

[4]  V. M. Goldschmidt,et al.  Die Gesetze der Krystallochemie , 1926, Naturwissenschaften.

[5]  Hannu Häkkinen,et al.  Bonding in Cu, Ag, and Au clusters: relativistic effects, trends, and surprises. , 2002, Physical review letters.

[6]  Matthias Scheffler,et al.  Ab initio molecular simulations with numeric atom-centered orbitals , 2009, Comput. Phys. Commun..

[7]  Gábor Csányi,et al.  Gaussian approximation potentials: A brief tutorial introduction , 2015, 1502.01366.

[8]  Swapan K. Ghosh,et al.  Relationship between Ionization Potential, Polarizability, and Softness: A Case Study of Lithium and Sodium Metal Clusters , 2004 .

[9]  Thomas Bligaard,et al.  The Brønsted–Evans–Polanyi relation and the volcano curve in heterogeneous catalysis , 2004 .

[10]  D. G. Pettifor,et al.  A chemical scale for crystal-structure maps , 1984 .

[11]  Uzi Landman,et al.  Gold clusters(AuN,2<~N<~10)and their anions , 2000 .

[12]  A. Vela,et al.  Electronic chemical response indexes at finite temperature in the canonical ensemble. , 2015, The Journal of chemical physics.

[13]  Jörg Behler,et al.  Constructing high‐dimensional neural network potentials: A tutorial review , 2015 .

[14]  Michele Parrinello,et al.  Demonstrating the Transferability and the Descriptive Power of Sketch-Map. , 2013, Journal of chemical theory and computation.

[15]  Hannu Häkkinen,et al.  When Gold Is Not Noble: Nanoscale Gold Catalysts , 1999 .

[16]  D. Mingos Gold Clusters, Colloids and Nanoparticles II , 2014 .

[17]  Mark S Gordon,et al.  Isomers of Au8. , 2007, The Journal of chemical physics.

[18]  Thomas Gärtner,et al.  Linear space direct pattern sampling using coupling from the past , 2012, KDD.

[19]  Alex Zunger,et al.  Systematization of the stable crystal structure of all AB-type binary compounds: A pseudopotential orbital-radii approach , 1980 .

[20]  Yousef Saad,et al.  Data mining for materials: Computational experiments with AB compounds , 2012 .

[21]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[22]  Robert M. Gray,et al.  Entropy and Information , 1990 .

[23]  Evert Jan Baerends,et al.  Geometry optimizations in the zero order regular approximation for relativistic effects. , 1999 .

[24]  Xiao Cheng Zeng,et al.  Evidence of hollow golden cages. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[25]  F. Remacle,et al.  The magic gold cluster Au20 , 2007 .

[26]  R. Parr,et al.  Principle of maximum hardness , 1991 .

[27]  F. Hawthorne,et al.  Crystals from first principles , 1990, Nature.

[28]  Tapan K. Ghanty,et al.  Correlation between hardness, polarizability, and size of atoms, molecules, and clusters , 1993 .

[29]  L. Ghiringhelli,et al.  Computational design of nanoclusters by property-based genetic algorithms: Tuning the electronic properties of (TiO2 )n clusters , 2015, 1501.05855.

[30]  Van Vechten,et al.  Quantum Dielectric Theory of Electronegativity in Covalent Systems. I. Electronic Dielectric Constant , 1969 .

[31]  M. Harbola Magic numbers for metallic clusters and the principle of maximum hardness. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Stefano Deledda,et al.  Extending the applicability of the Goldschmidt tolerance factor to arbitrary ionic compounds , 2016, Scientific Reports.

[33]  R. Pearson THE PRINCIPLE OF MAXIMUM HARDNESS , 1993 .

[34]  Jijun Zhao,et al.  Density-functional study of Au n ( n = 2 – 2 0 ) clusters: Lowest-energy structures and electronic properties , 2002 .

[35]  M. Moseler,et al.  Liquid-liquid phase coexistence in gold clusters: 2D or not 2D? , 2007, Physical review letters.

[36]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[37]  Lakhmi C. Jain,et al.  Advanced Techniques in Knowledge Discovery and Data Mining (Advanced Information and Knowledge Processing) , 2005 .

[38]  Matthias Rupp,et al.  Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. , 2015, Journal of chemical theory and computation.

[39]  L. Ghiringhelli,et al.  Not so loosely bound rare gas atoms: finite-temperature vibrational fingerprints of neutral gold-cluster complexes , 2013 .

[40]  J. Nørskov,et al.  Towards the computational design of solid catalysts. , 2009, Nature chemistry.

[41]  Paul Raccuglia,et al.  Machine-learning-assisted materials discovery using failed experiments , 2016, Nature.

[42]  Alexandre Tkatchenko,et al.  Unraveling the stability of polypeptide helices: critical role of van der Waals interactions. , 2011, Physical review letters.

[43]  S. Noorizadeh The maximum hardness and minimum polarizability principles in accordance with the Bent rule , 2005 .

[44]  Robert G. Parr,et al.  Variational Principles for Describing Chemical Reactions: The Fukui Function and Chemical Hardness Revisited , 2000 .

[45]  Michael W Deem,et al.  Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[46]  Mikael P. Johansson,et al.  2D-3D transition of gold cluster anions resolved , 2008 .

[47]  Wouter Duivesteijn,et al.  Exceptional Model Mining , 2008, Data Mining and Knowledge Discovery.

[48]  J. C. Phillips,et al.  Dielectric Classification of Crystal Structures, Ionization Potentials, and Band Structures , 1969 .

[49]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[50]  Gerbrand Ceder,et al.  Predicting crystal structure by merging data mining with quantum mechanics , 2006, Nature materials.

[51]  T. Lookman,et al.  Classification of octet AB-type binary compounds using dynamical charges: A materials informatics perspective , 2015, Scientific Reports.

[52]  Robert G. Parr,et al.  New measures of aromaticity: absolute hardness and relative hardness , 1989 .

[53]  Uzi Landman,et al.  Structural evolution of Au nanoclusters: From planar to cage to tubular motifs , 2006 .

[54]  H. Häkkinen,et al.  Atomic and electronic structure of gold clusters: understanding flakes, cages and superatoms from simple concepts. , 2008, Chemical Society reviews.

[55]  Jinlan Wang,et al.  Static polarizabilities and optical absorption spectra of gold clusters ( Au n , n = 2 – 14 and 20) from first principles , 2007 .

[56]  J. Soler,et al.  Trends in the structure and bonding of noble metal clusters , 2004 .

[57]  W. B. Pearson,et al.  On the crystal chemistry of normal valence compounds , 1959 .

[58]  Marcella Iannuzzi,et al.  Free energy surface of two- and three-dimensional transitions of Au 12 nanoclusters obtained by ab initio metadynamics , 2010 .

[59]  S. Pal,et al.  Understanding the Reactivity Properties of Aun (6 ≤ n ≤ 13) Clusters Using Density Functional Theory Based Reactivity Descriptors , 2010 .

[60]  Tim Mueller,et al.  Origins of hole traps in hydrogenated nanocrystalline and amorphous silicon revealed through machine learning , 2014 .

[61]  Xiang-dong Yang,et al.  Size dependence of the structures and energetic and electronic properties of gold clusters. , 2007, The Journal of chemical physics.

[62]  Britta Redlich,et al.  Structures of Neutral Au7, Au19, and Au20 Clusters in the Gas Phase , 2008, Science.

[63]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[64]  D. P. Woodruff Atomic clusters from gas phase to deposited , 2007 .

[65]  Krishna Rajan,et al.  Materials Informatics: The Materials ``Gene'' and Big Data , 2015 .

[66]  Jun Li,et al.  Au20: A Tetrahedral Cluster , 2003, Science.

[67]  Lance J. Nelson,et al.  Compressive sensing as a paradigm for building physics models , 2013 .

[68]  J. Vybíral,et al.  Big data of materials science: critical role of the descriptor. , 2014, Physical review letters.

[69]  Alexandre Tkatchenko,et al.  Long-range correlation energy calculated from coupled atomic response functions. , 2013, The Journal of chemical physics.

[70]  Vladimir Vovk,et al.  Kernel Ridge Regression , 2013, Empirical Inference.

[71]  Chiho Kim,et al.  From Organized High-Throughput Data to Phenomenological Theory using Machine Learning: The Example of Dielectric Breakdown , 2016 .

[72]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[73]  Burke,et al.  Generalized Gradient Approximation Made Simple. , 1996, Physical review letters.

[74]  Marco Buongiorno Nardelli,et al.  The high-throughput highway to computational materials design. , 2013, Nature materials.

[75]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[76]  Wei Chen,et al.  A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of k-nary Inorganic Polycrystalline Compounds , 2016, Scientific Reports.

[77]  Ranbir Singh,et al.  J. Mol. Struct. (Theochem) , 1996 .

[78]  James E. Gubernatis,et al.  Structure classification and melting temperature prediction in octet AB solids via machine learning , 2015 .

[79]  Vladimir Vovk,et al.  Empirical Inference - Festschrift in Honor of Vladimir N. Vapnik , 2014, Empirical Inference.

[80]  Vladan Stevanović,et al.  Material descriptors for predicting thermoelectric performance , 2015 .

[81]  D. Vernon Inform , 1995, Encyclopedia of the UN Sustainable Development Goals.

[82]  Pekka Pyykkö,et al.  Relativistic effects in structural chemistry , 1988 .

[83]  Christopher M Wolverton,et al.  Dissolving the Periodic Table in Cubic Zirconia: Data Mining to Discover Chemical Trends , 2014 .

[84]  Evert Jan Baerends,et al.  The zero order regular approximation for relativistic effects: the effect of spin-orbit coupling in closed shell molecules. , 1996 .

[85]  Nada Lavrac,et al.  Induction of comprehensible models for gene expression datasets by subgroup discovery methodology , 2004, J. Biomed. Informatics.

[86]  R. Balawender,et al.  Revisiting the chemical reactivity indices as the state function derivatives. The role of classical chemical hardness. , 2015, The Journal of chemical physics.

[87]  Matthias Rupp,et al.  Machine learning for quantum mechanics in a nutshell , 2015 .

[88]  Martin Atzmüller,et al.  Subgroup discovery , 2005, Künstliche Intell..

[89]  J. C. Phillips Ionicity of the Chemical Bond in Crystals , 1970 .

[90]  Mikael P. Johansson,et al.  At What Size Do Neutral Gold Clusters Turn Three-Dimensional? , 2014 .

[91]  R. Parr,et al.  Hardness, softness, and the fukui function in the electronic theory of metals and catalysis. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[92]  Anubhav Jain,et al.  Computational predictions of energy materials using density functional theory , 2016 .

[93]  A. Bloch,et al.  Quantum-Defect Electronegativity Scale for Nontransition Elements , 1974 .

[94]  W. Andreoni,et al.  Gold and platinum microclusters and their anions: comparison of structural and electronic properties , 2000 .

[95]  Klaus-Robert Müller,et al.  Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. , 2013, Journal of chemical theory and computation.

[96]  María José del Jesús,et al.  An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.

[97]  B. A. Calfa,et al.  Property Prediction of Crystalline Solids from Composition and Crystal Structure , 2016 .

[98]  Thomas Hammerschmidt,et al.  Three-Parameter Crystal-Structure Prediction for sp-d-Valent Compounds , 2016 .

[99]  P. Chattaraj,et al.  On the validity of the maximum hardness principle and the minimum electrophilicity principle during chemical reactions. , 2013, The journal of physical chemistry. A.

[100]  T. Ishihara,et al.  Oxygen Activation on Nanometer-Size Gold Nanoparticles , 2012 .

[101]  Kristin A. Persson,et al.  Predicting crystal structures with data mining of quantum calculations. , 2003, Physical review letters.