Generative adversarial networks (GAN) based efficient sampling of chemical composition space for inverse design of inorganic materials

A major challenge in materials design is how to efficiently search the vast chemical design space to find the materials with desired properties. One effective strategy is to develop sampling algorithms that can exploit both explicit chemical knowledge and implicit composition rules embodied in the large materials database. Here, we propose a generative machine learning model (MatGAN) based on a generative adversarial network (GAN) for efficient generation of new hypothetical inorganic materials. Trained with materials from the ICSD database, our GAN model can generate hypothetical materials not existing in the training dataset, reaching a novelty of 92.53% when generating 2 million samples. The percentage of chemically valid (charge-neutral and electronegativity-balanced) samples out of all generated ones reaches 84.5% when generated by our GAN trained with such samples screened from ICSD, even though no such chemical rules are explicitly enforced in our GAN model, indicating its capability to learn implicit chemical composition rules to form compounds. Our algorithm is expected to be used to greatly expand the range of the design space for inverse design and large-scale computational screening of inorganic materials.

[1]  I. D. Brown,et al.  The inorganic crystal structure data base , 1983, J. Chem. Inf. Comput. Sci..

[2]  Nikolaus Hansen,et al.  USPEX - Evolutionary crystal structure prediction , 2006, Comput. Phys. Commun..

[3]  Akihisa Inoue,et al.  An application of Pettifor structure maps for the identification of pseudo-binary quasicrystalline intermetallics , 2006 .

[4]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[5]  W. Jensen The Origin of the Ionic-Radius Ratio Rules , 2010 .

[6]  Li Zhu,et al.  CALYPSO: A method for crystal structure prediction , 2012, Comput. Phys. Commun..

[7]  Muratahan Aykol,et al.  Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD) , 2013 .

[8]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[9]  Jianjun Hu,et al.  A Combinatorial Genetic Algorithm for Computational Doping based Material Design , 2015, GECCO.

[10]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[11]  Muratahan Aykol,et al.  The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies , 2015 .

[12]  Carl Doersch,et al.  Tutorial on Variational Autoencoders , 2016, ArXiv.

[13]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[14]  Alok Choudhary,et al.  A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials , 2016 .

[15]  Emrah Atilgan,et al.  Computational doping for fuel cell material design based on genetic algorithms and genetic programming () , 2016 .

[16]  Aron Walsh,et al.  Computational Screening of All Stoichiometric Inorganic Materials , 2016, Chem.

[17]  Zhe Gan,et al.  Variational Autoencoder for Deep Learning of Images, Labels and Captions , 2016, NIPS.

[18]  James Theiler,et al.  Accelerated search for materials with targeted properties by adaptive design , 2016, Nature Communications.

[19]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[20]  J. Hogden,et al.  Statistical inference and adaptive design for materials discovery , 2017 .

[21]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[22]  Wei-keng Liao,et al.  ElemNet: Deep Learning the Chemistry of Materials From Only Elemental Composition , 2018, Scientific Reports.

[23]  Qi Liu,et al.  Advances and challenges in deep generative models for de novo molecule generation , 2018, WIREs Computational Molecular Science.

[24]  Shyue Ping Ong,et al.  Deep neural networks for accurate predictions of crystal stability , 2017, Nature Communications.

[25]  Andrew L. Ferguson,et al.  Machine learning and data science in soft materials engineering , 2018, Journal of physics. Condensed matter : an Institute of Physics journal.

[26]  Jianjun Hu,et al.  First-principle-based computational doping of SrTiO$$_{3}$$3 using combinatorial genetic algorithms , 2018 .

[27]  Alán Aspuru-Guzik,et al.  Inverse Design of Solid-State Materials via a Continuous Representation , 2019, Matter.

[28]  Nataliya Sokolovska,et al.  CrystalGAN: Learning to Discover Crystallographic Structures with Generative Adversarial Networks , 2018, AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering.

[29]  Koji Morikawa,et al.  Study of Deep Generative Models for Inorganic Chemical Compositions , 2019, ArXiv.

[30]  Yoshua Bengio,et al.  Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures , 2019, ArXiv.

[31]  Guillermo Sapiro,et al.  Continuous Dice Coefficient: a Method for Evaluating Probabilistic Segmentations , 2018, bioRxiv.

[32]  Zois Boukouvalas,et al.  Deep learning for molecular generation and optimization - a review of the state of the art , 2019, Molecular Systems Design & Engineering.

[33]  Kyunghyun Cho,et al.  Conditional molecular design with deep generative models , 2018, J. Chem. Inf. Model..

[34]  Jianfeng Pei,et al.  Deep learning for molecular generation. , 2019, Future medicinal chemistry.

[35]  Ekin D Cubuk,et al.  Screening billions of candidates for solid lithium-ion conductors: A transfer learning approach for small data. , 2019, The Journal of chemical physics.

[36]  Badri Narayanan,et al.  Machine learning enabled autonomous microstructural characterization in 3D samples , 2020, npj Computational Materials.

[37]  Sen Shao,et al.  The exotically stoichiometric compounds in Al–S system under high pressure , 2020, npj Computational Materials.