Band Gap Prediction for Large Organic Crystal Structures with Machine Learning

Machine-learning models are capable of capturing the structure-property relationship from a dataset of computationally demanding ab initio calculations. Over the past two years, the Organic Materials Database (OMDB) has hosted a growing number of calculated electronic properties of previously synthesized organic crystal structures. The complexity of the organic crystals contained within the OMDB, which have on average 82 atoms per unit cell, makes this database a challenging platform for machine learning applications. In this paper, the focus is on predicting the band gap which represents one of the basic properties of a crystalline materials. With this aim, a consistent dataset of 12 500 crystal structures and their corresponding DFT band gap are released, freely available for download at this https URL. An ensemble of two state-of-the-art models reach a mean absolute error (MAE) of 0.388 eV, which corresponds to a percentage error of 13% for an average band gap of 3.05 eV. Finally, the trained models are employed to predict the band gap for 260 092 materials contained within the Crystallography Open Database (COD) and made available online so that the predictions can be obtained for any arbitrary crystal structure uploaded by a user.

[1]  Matthias Rupp,et al.  Machine learning for quantum mechanics in a nutshell , 2015 .

[2]  Matthias Stein,et al.  Accurate lattice energies of organic molecular crystals from periodic turbomole calculations , 2018, J. Comput. Chem..

[3]  Karl Leo,et al.  Molecular-scale simulation of electroluminescence in a multilayer white organic light-emitting diode. , 2013, Nature materials.

[4]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[5]  Alok Choudhary,et al.  A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials , 2016 .

[6]  Kresse,et al.  Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. , 1996, Physical review. B, Condensed matter.

[7]  Felix A Faber,et al.  Crystal structure representations for machine learning models of formation energies , 2015, 1503.07406.

[8]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[9]  R Matthias Geilhufe,et al.  Organic materials database: An open-access online database for data mining , 2017, PloS one.

[10]  Jeffrey C Grossman,et al.  Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. , 2017, Physical review letters.

[11]  R. Downs Topology of the pyroxenes as a function of temperature, pressure, and composition as determined from the procrystal electron density , 2003 .

[12]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[13]  Z. Leśnikowski Challenges and Opportunities for the Application of Boron Clusters in Drug Design. , 2016, Journal of medicinal chemistry.

[14]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[15]  K-R Müller,et al.  SchNetPack: A Deep Learning Toolbox For Atomistic Systems. , 2018, Journal of chemical theory and computation.

[16]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[17]  Valentin Dediu,et al.  Room temperature spin polarized injection in organic semiconductor , 2002 .

[18]  Saulius Gražulis,et al.  Crystallography Open Database – an open-access collection of crystal structures , 2009, Journal of applied crystallography.

[19]  Alexander V. Balatsky,et al.  Novel Organic High-$T_\mathrm{c}$ Superconductors: Data Mining using Density of States Similarity Search , 2017, 1709.03151.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Corey Oses,et al.  Machine learning modeling of superconducting critical temperature , 2017, npj Computational Materials.

[22]  G. Kresse,et al.  From ultrasoft pseudopotentials to the projector augmented-wave method , 1999 .

[23]  Joel S. Miller World Scientific Reference on Spin in Organics: Volume 4: Spin in Organics , 2018 .

[24]  Paolo Ruggerone,et al.  Computational Materials Science X , 2002 .

[25]  Adam S. Foster,et al.  Machine learning hydrogen adsorption on nanoclusters through structural descriptors , 2018, npj Computational Materials.

[26]  Li Li,et al.  Bypassing the Kohn-Sham equations with machine learning , 2016, Nature Communications.

[27]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[28]  Burke,et al.  Generalized Gradient Approximation Made Simple. , 1996, Physical review letters.

[29]  Michele Ceriotti,et al.  Atom-density representations for machine learning. , 2018, The Journal of chemical physics.

[30]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[31]  Shou-Cheng Zhang,et al.  Learning atoms for materials discovery , 2018, Proceedings of the National Academy of Sciences.

[32]  M. Dressel,et al.  Direct observation of quantum coherence in single-molecule magnets. , 2008, Physical review letters.

[33]  Theory and Practice of Atom-Density Representations for Machine Learning , 2018 .

[34]  Rudolf Zeller,et al.  Towards a linear-scaling algorithm for electronic structure calculations with the tight-binding Korringa–Kohn–Rostoker Green function method , 2008 .

[35]  M. Klintenberg,et al.  Data mining and accelerated electronic structure theory as a tool in the search for new functional materials , 2008, 0808.2125.

[36]  Robin Taylor,et al.  Mercury: visualization and analysis of crystal structures , 2006 .

[37]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[38]  Alexander V. Balatsky,et al.  Online search tool for graphical patterns in electronic band structures , 2017, npj Computational Materials.

[39]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[40]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[41]  Claudia Draxl,et al.  NOMAD: The FAIR concept for big data-driven materials science , 2018, MRS Bulletin.

[42]  Evert Jan Baerends,et al.  Towards an order , 1998 .

[43]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[44]  Z. Vardeny Spintronics: Organics strike back. , 2009, Nature materials.

[45]  Zoubin Ghahramani,et al.  Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.

[46]  Eugenio Coronado,et al.  Spin qubits with electrically gated polyoxometalate molecules. , 2007, Nature nanotechnology.

[47]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[48]  Stuart C. Wimbush,et al.  A Public Database of High-Temperature Superconductor Critical Current Data , 2017, IEEE Transactions on Applied Superconductivity.

[49]  Patrick A. Lee,et al.  U(1) gauge theory of the Hubbard model: spin liquid states and possible application to kappa-(BEDT-TTF)2Cu2(CN)3. , 2005, Physical review letters.

[50]  I. Bruno,et al.  Cambridge Structural Database , 2002 .

[51]  Z. Valy Vardeny,et al.  Organic-based magnon spintronics , 2018, Nature Materials.

[52]  Y. Shimizu,et al.  Spin liquid state in an organic Mott insulator with a triangular lattice. , 2003, Physical review letters.

[53]  Shih‐Yuan Liu,et al.  Bis-BN cyclohexane: a remarkably kinetically stable chemical hydrogen storage material. , 2015, Journal of the American Chemical Society.

[54]  J. G. Snijders,et al.  Towards an order-N DFT method , 1998 .

[55]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[56]  S. Suram,et al.  Generating information-rich high-throughput experimental materials genomes using functional clustering via multitree genetic programming and information theory. , 2015, ACS combinatorial science.

[57]  D. Schomburg,et al.  BRENDA: a resource for enzyme data and metabolic information. , 2002, Trends in biochemical sciences.

[58]  William L. Jorgensen,et al.  Journal of Chemical Information and Modeling , 2005, J. Chem. Inf. Model..

[59]  G. Kresse,et al.  Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set , 1996 .

[60]  A. Tkatchenko,et al.  Understanding molecular crystals with dispersion-inclusive density functional theory: pairwise corrections and beyond. , 2014, Accounts of chemical research.

[61]  Matthias Rupp,et al.  Unified representation of molecules and crystals for machine learning , 2017, Mach. Learn. Sci. Technol..

[62]  Justin S. Smith,et al.  Hierarchical modeling of molecular energies using a deep neural network. , 2017, The Journal of chemical physics.

[63]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[64]  C. J. M. Emmott,et al.  Reducing the efficiency-stability-cost gap of organic photovoltaics with highly efficient and stable small molecule acceptor ternary solar cells. , 2017, Nature materials.

[65]  Gábor Csányi,et al.  Comparing molecules and solids across structural and alchemical space. , 2015, Physical chemistry chemical physics : PCCP.

[66]  Maho Nakata,et al.  PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry , 2017, J. Chem. Inf. Model..

[67]  Takehiko Mori,et al.  A new organic superconductor β-(meso-DMBEDT-TTF)2PF6 , 2004 .

[68]  A Data-Driven Construction of the Periodic Table of the Elements , 2018 .

[69]  S. Rühle Tabulated values of the Shockley–Queisser limit for single junction solar cells , 2016 .

[70]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[71]  Saulius Gražulis,et al.  COD::CIF::Parser: an error-correcting CIF parser for the Perl language , 2016, Journal of applied crystallography.

[72]  R. Downs,et al.  The American Mineralogist crystal structure database , 2003 .

[73]  W. Marsden I and J , 2012 .

[74]  A. J. Sobral,et al.  On the Performance of Hybrid Functionals for Non-linear Optical Properties and Electronic Excitations in Chiral Molecular Crystals: The Case of Butterfly-Shaped Dicinnamalacetone. , 2018, Chemphyschem : a European journal of chemical physics and physical chemistry.

[75]  J. Schlueter,et al.  Quantum spin liquids unveil the genuine Mott state , 2017, Nature Materials.