Crowd-sourcing materials-science challenges with the NOMAD 2018 Kaggle competition

A public data-analytics competition was organized by the Novel Materials Discovery (NOMAD) Centre of Excellence and hosted by the online platform Kaggle by using a dataset of 3,000 (AlxGayIn1–x–y)2O3 compounds. Its aim was to identify the best machine-learning (ML) model for the prediction of two key physical properties that are relevant for optoelectronic applications: the electronic bandgap energy and the crystalline formation energy. Here, we present a summary of the top-three ranked ML approaches. The first-place solution was based on a crystal-graph representation that is novel for the ML of properties of materials. The second-place model combined many candidate descriptors from a set of compositional, atomic-environment-based, and average structural properties with the light gradient-boosting machine regression model. The third-place model employed the smooth overlap of atomic position representation with a neural network. The Pearson correlation among the prediction errors of nine ML models (obtained by combining the top-three ranked representations with all three employed regression models) was examined by using the Pearson correlation to gain insight into whether the representation or the regression model determines the overall model performance. Ensembling relatively decorrelated models (based on the Pearson correlation) leads to an even higher prediction accuracy.

[1]  D. Pettifor,et al.  The structures of binary compounds. I. Phenomenological structure maps , 1986 .

[2]  O. Bierwagen Indium oxide—a transparent, wide-band gap semiconductor for (opto)electronic applications , 2015 .

[3]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[4]  Christopher J. Bartel,et al.  Machine learning for heterogeneous catalyst design and discovery , 2018 .

[5]  Randy Hoffman,et al.  Transparent thin-film transistors with zinc indium oxide channel layer , 2005 .

[6]  S. Müller Bulk and surface ordering phenomena in binary metal alloys , 2003 .

[7]  Zhen Zhu,et al.  Structural and optical properties of Ga2O3:In films deposited on MgO (1 0 0) substrates by MOCVD , 2011 .

[8]  Ferreira,et al.  Efficient cluster expansion for substitutional systems. , 1992, Physical review. B, Condensed matter.

[9]  E. Husson,et al.  Structural studies of transition aluminas. Theta alumina , 1996 .

[10]  Walter F. Stenning,et al.  AN EMPIRICAL STUDY , 2003 .

[11]  Haoyan Huo,et al.  Unified Representation for Machine Learning of Molecules and Crystals , 2017 .

[12]  Gus L. W. Hart,et al.  Obtaining Ising-like expansions for binary alloys from first principles , 2002 .

[13]  Thomas Hammerschmidt,et al.  Bond-order potentials: derivation and parameterization for refractory elements , 2015 .

[14]  H. Ohno,et al.  Repeated temperature modulation epitaxy for p-type doping and light-emitting diode based on ZnO , 2004 .

[15]  Engineering,et al.  Prediction model of band gap for inorganic compounds by combination of density functional theory calculations and machine learning techniques , 2016 .

[16]  Ralf Drautz,et al.  Valence-dependent analytic bond-order potential for transition metals , 2006 .

[17]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[18]  M. Halvarsson,et al.  Determination of the thermal expansion of κ-Al2O3 by high temperature XRD , 1995 .

[19]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[20]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[21]  Matthias Scheffler,et al.  Ab initio molecular simulations with numeric atom-centered orbitals , 2009, Comput. Phys. Commun..

[22]  Kristof T. Schütt,et al.  How to represent crystal structures for machine learning: Towards fast prediction of electronic properties , 2013, 1307.1266.

[23]  R. D. Shannon,et al.  Synthesis and structure of phases in the In2O3Ga2O3 system , 1968 .

[24]  David G. Pettifor,et al.  Structure maps revisited , 2003 .

[25]  STAT , 2019, Springer Reference Medizin.

[26]  A. Aleksov,et al.  Microwave performance evaluation of diamond surface channel FETs , 2004 .

[27]  H. Hirayama,et al.  Room-temperature operation at 333 nm of Al0.03Ga0.97N/Al0.25Ga0.75N quantum-well light-emitting diodes with Mg-doped superlattice layers , 2000 .

[28]  R. Uecker,et al.  On the nature and temperature dependence of the fundamental band gap of In2O3 , 2014 .

[29]  G. Pilania,et al.  Machine learning bandgaps of double perovskites , 2016, Scientific Reports.

[30]  Ralf Drautz,et al.  TCP phase predictions in Ni-based superalloys: Structure maps revisited , 2011 .

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  D. Pettifor,et al.  Electronic structure based descriptor for characterizing local atomic environments , 2018, Physical Review B.

[33]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[34]  Scott M. Lundberg,et al.  Consistent Individualized Feature Attribution for Tree Ensembles , 2018, ArXiv.

[35]  Miguel A. L. Marques,et al.  Predicting the Thermodynamic Stability of Solids Combining Density Functional Theory and Machine Learning , 2017 .

[36]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[37]  Stéphane Mallat,et al.  Quantum Energy Regression using Scattering Transforms , 2015, ArXiv.

[38]  Q. Guo,et al.  Wide bandgap engineering of (AlGa)2O3 films , 2014 .

[39]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[40]  T. Egawa,et al.  Improved dc characteristics of AlGaN/GaN high-electron-mobility transistors on AlN/sapphire templates , 2002 .

[41]  J. Vybíral,et al.  Big data of materials science: critical role of the descriptor. , 2014, Physical review letters.

[42]  Christopher J. Bartel,et al.  New tolerance factor to predict the stability of perovskite oxides and halides , 2018, Science Advances.

[43]  Felix A Faber,et al.  Crystal structure representations for machine learning models of formation energies , 2015, 1503.07406.

[44]  Stefano Curtarolo,et al.  SISSO: A compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates , 2017, Physical Review Materials.

[45]  Jeffrey C Grossman,et al.  Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. , 2017, Physical review letters.

[46]  James E. Gubernatis,et al.  Multi-fidelity machine learning models for accurate bandgap predictions of solids , 2017 .

[47]  L. Kong,et al.  Structural and optical properties of Ga2(1−x)In2xO3 films prepared on α-Al2O3 (0 0 0 1) by MOCVD , 2009 .

[48]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[49]  Ralf Eggeling,et al.  User guide , 2000 .

[50]  K. Kawamura,et al.  Current injection emission from a transparent p-n junction composed of p-SrCu~2O~2/n-ZnO , 2000 .

[51]  Chiho Kim,et al.  From Organized High-Throughput Data to Phenomenological Theory using Machine Learning: The Example of Dielectric Breakdown , 2016 .

[52]  Tie-Yan Liu,et al.  A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS 2017.

[53]  Alok Choudhary,et al.  Combinatorial screening for new materials in unconstrained composition space with machine learning , 2014 .

[54]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[55]  Gus L. W. Hart,et al.  Cluster expansion made easy with Bayesian compressive sensing , 2013 .

[56]  P. Kahol,et al.  High mobility, transparent, conducting Gd-doped In2O3 thin films by pulsed laser deposition , 2008 .

[57]  H. Ohta,et al.  Room-temperature fabrication of transparent flexible thin-film transistors using amorphous oxide semiconductors , 2004, Nature.

[58]  A. van de Walle,et al.  The Alloy Theoretic Automated Toolkit: A User Guide , 2002 .

[59]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[60]  H. Ohta,et al.  Thin-Film Transistor Fabricated in Single-Crystalline Transparent Oxide Semiconductor , 2003, Science.

[61]  N. Ishizawa,et al.  Structural Evolution of Corundum at High Temperatures , 2008 .

[62]  Atsuto Seko,et al.  Representation of compounds for machine-learning prediction of physical properties , 2016, 1611.08645.

[63]  M. Scheffler,et al.  Insightful classification of crystal structures using deep learning , 2017, Nature Communications.

[64]  Masashi Kawasaki,et al.  High Mobility Thin Film Transistors with Transparent ZnO Channels , 2003 .

[65]  Klaus-Robert Müller,et al.  Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. , 2013, Journal of chemical theory and computation.

[66]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[67]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[68]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[69]  Corey Oses,et al.  Materials Cartography: Representing and Mining Material Space Using Structural and Electronic Fingerprints , 2014, 1412.4096.

[70]  J. Behler Atom-centered symmetry functions for constructing high-dimensional neural network potentials. , 2011, The Journal of chemical physics.

[71]  Taylor D. Sparks,et al.  High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds , 2016 .

[72]  Roberto Orlando,et al.  First-principles study of the structural, electronic, and optical properties of Ga 2 O 3 in its monoclinic and hexagonal phases , 2006 .

[73]  P. Roy,et al.  Experimental and ab initio infrared study of χ-, κ‐ and α-aluminas formed from gibbsite , 2010 .

[74]  F. Ducastelle,et al.  Generalized cluster description of multicomponent systems , 1984 .

[75]  Zhen Zhu,et al.  Preparation and characterization of Ga2xIn2(1−x)O3 films deposited on ZrO2 (1 0 0) substrates by MOCVD , 2010 .

[76]  Pierre Villars,et al.  The Structures of Binary Compounds , 1990 .

[77]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[78]  F. Cyrot-Lackmann On the electronic structure of liquid transitional metals , 1967 .

[79]  L. Vegard,et al.  Die Konstitution der Mischkristalle und die Raumfüllung der Atome , 1921 .

[80]  H. Y. Playford,et al.  Structures of uncharacterised polymorphs of gallium oxide from total neutron diffraction. , 2013, Chemistry.

[81]  S. Nakamura,et al.  Candela‐class high‐brightness InGaN/AlGaN double‐heterostructure blue‐light‐emitting diodes , 1994 .

[82]  R. D. Shannon Revised effective ionic radii and systematic studies of interatomic distances in halides and chalcogenides , 1976 .

[83]  James S. Speck,et al.  Electrical transport, electrothermal transport, and effective electron mass in single-crystalline In 2 O 3 films , 2013 .

[84]  L. Breiman Stacked Regressions , 1996, Machine Learning.

[85]  N. Wang,et al.  BOPfox program for tight-binding and analytic bond-order potential calculations , 2018, Comput. Phys. Commun..

[86]  C. Walle,et al.  Experimental electronic structure of In2O3 and Ga2O3 , 2011 .

[87]  Benjamin J. Norris,et al.  ZnO-based transparent thin-film transistors , 2003 .

[88]  Vladan Stevanović,et al.  Physical descriptor for the Gibbs energy of inorganic crystalline solids and temperature-dependent materials chemistry , 2018, Nature Communications.

[89]  M. Kogler,et al.  Methanol steam reforming: CO2-selective Pd2Ga phases supported on α- and γ-Ga2O3 , 2013 .

[90]  A. N. Ladines,et al.  Crystal-Structure Analysis with Moments of the Density-of-States: Application to Intermetallic Topologically Close-Packed Phases , 2016 .

[91]  Thomas Hammerschmidt,et al.  Three-Parameter Crystal-Structure Prediction for sp-d-Valent Compounds , 2016 .

[92]  S. Nakagomi,et al.  Sol–gel prepared (Ga1−xInx)2O3 thin films for solar‐blind ultraviolet photodetectors , 2010 .

[93]  Hiroshi Ito,et al.  Growth and Band Gap Control of Corundum-Structured $\alpha$-(AlGa)$_{2}$O$_{3}$ Thin Films on Sapphire by Spray-Assisted Mist Chemical Vapor Deposition , 2012 .

[94]  S. Fujita,et al.  Properties of Ga2O3‐based (Inx Ga1–x )2O3 alloy thin films grown by molecular beam epitaxy , 2008 .

[95]  Karsten W. Jacobsen,et al.  An object-oriented scripting interface to a legacy electronic structure code , 2002, Comput. Sci. Eng..

[96]  A. K. Tyagi,et al.  High-pressure lattice dynamical study of bulk and nanocrystalline In2O3 , 2012 .

[97]  Fawzi Mohamed,et al.  Towards efficient data exchange and sharing for big-data driven materials science: metadata and data formats , 2017, npj Computational Materials.

[98]  Zongwen Liu,et al.  Large-scale synthesis of hexagonal corundum-type In2O3 by ball milling with enhanced lithium storage capabilities , 2013 .