JAMIP: an artificial-intelligence aided data-driven infrastructure for computational materials informatics.

Abstract Materials informatics has emerged as a promisingly new paradigm for accelerating materials discovery and design. It exploits the intelligent power of machine learning methods in massive materials data from experiments or simulations to seek new materials, functionality, and principles, etc. Developing specialized facilities to generate, collect, manage, learn, and mine large-scale materials data is crucial to materials informatics. We herein developed an artificial-intelligence-aided data-driven infrastructure named Jilin Artificial-intelligence aided Materials-design Integrated Package (JAMIP), which is an open-source Python framework to meet the research requirements of computational materials informatics. It is integrated by materials production factory, high-throughput first-principles calculations engine, automatic tasks submission and monitoring progress, data extraction, management and storage system, and artificial intelligence machine learning based data mining functions. We have integrated specific features such as an inorganic crystal structure prototype database to facilitate high-throughput calculations and essential modules associated with machine learning studies of functional materials. We demonstrated how our developed code is useful in exploring materials informatics of optoelectronic semiconductors by taking halide perovskites as typical case. By obeying the principles of automation, extensibility, reliability, and intelligence, the JAMIP code is a promisingly powerful tool contributing to the fast-growing field of computational materials informatics.

[1]  Yuanhui Sun,et al.  Rational Design of Halide Double Perovskites for Optoelectronic Applications , 2018, Joule.

[2]  Zhiming Shi,et al.  Band structure engineering through van der Waals heterostructing superlattices of two‐dimensional transition metal dichalcogenides , 2020 .

[3]  Jinlan Wang,et al.  Accelerated discovery of stable lead-free hybrid organic-inorganic perovskites via machine learning , 2018, Nature Communications.

[4]  Albert V. Davydov,et al.  MPInterfaces: A Materials Project based Python tool for high-throughput computational screening of interfacial systems , 2016, 1602.07784.

[5]  David W. Walker,et al.  Concurrency and Computation Practice and Experience , 2018 .

[6]  Yue Liu,et al.  Materials discovery and design using machine learning , 2017 .

[7]  Jian Lv,et al.  Interface structure prediction via CALYPSO method. , 2019, Science bulletin.

[8]  George C. Schatz,et al.  The journal of physical chemistry letters , 2009 .

[9]  Paolo Ruggerone,et al.  Computational Materials Science X , 2002 .

[10]  Chris J Pickard,et al.  Ab initio random structure searching , 2011, Journal of physics. Condensed matter : an Institute of Physics journal.

[11]  David J. Singh,et al.  Dielectric Behavior as a Screen in Rational Searches for Electronic Materials: Metal Pnictide Sulfosalts. , 2018, Journal of the American Chemical Society.

[12]  Brian L. DeCost,et al.  The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design , 2020, npj Computational Materials.

[13]  Nikolaus Hansen,et al.  USPEX - Evolutionary crystal structure prediction , 2006, Comput. Phys. Commun..

[14]  Vladan Stevanović,et al.  TE Design Lab: A virtual laboratory for thermoelectric material design , 2016 .

[15]  Brian L. DeCost,et al.  On-the-fly closed-loop materials discovery via Bayesian active learning , 2020, Nature Communications.

[16]  Zhen Zhou Journal of Materials Chemistry A and Materials Advances Editor’s choice web collection: “Machine learning for materials innovation” , 2021, Journal of Materials Chemistry A.

[17]  Muratahan Aykol,et al.  The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies , 2015 .

[18]  Kresse,et al.  Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. , 1996, Physical review. B, Condensed matter.

[19]  Stefano Curtarolo,et al.  SISSO: A compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates , 2017, Physical Review Materials.

[20]  Alex Zunger,et al.  Genomic design of strong direct-gap optical transition in Si/Ge core/multishell nanowires. , 2012, Nano letters.

[21]  P. Rinke,et al.  Data‐Driven Materials Science: Status, Challenges, and Perspectives , 2019, Advanced science.

[22]  F. Ciucci,et al.  A molecular dynamics study of oxygen ion diffusion in A-site ordered perovskite PrBaCo(2)O(5.5): data mining the oxygen trajectories. , 2015, Physical chemistry chemical physics : PCCP.

[23]  Richard G Hennig,et al.  A grand canonical genetic algorithm for the prediction of multi-component phase diagrams and testing of empirical potentials , 2013, Journal of physics. Condensed matter : an Institute of Physics journal.

[24]  J. Vybíral,et al.  Big data of materials science: critical role of the descriptor. , 2014, Physical review letters.

[25]  Xu Zhang,et al.  Machine learning: Accelerating materials development for energy storage and conversion , 2020, InfoMat.

[26]  Alex Zunger,et al.  The inverse band-structure problem of finding an atomic configuration with given electronic properties , 1999, Nature.

[27]  S. Curtarolo,et al.  AFLOW: An automatic framework for high-throughput materials discovery , 2012, 1308.5715.

[28]  Boris Kozinsky,et al.  AiiDA: Automated Interactive Infrastructure and Database for Computational Science , 2015, ArXiv.

[29]  M. Marques,et al.  Recent advances and applications of machine learning in solid-state materials science , 2019, npj Computational Materials.

[30]  Lijun Zhang,et al.  Computational functionality‐driven design of semiconductors for optoelectronic applications , 2020, InfoMat.

[31]  Zachary W. Ulissi,et al.  Active learning across intermetallics to guide discovery of electrocatalysts for CO2 reduction and H2 evolution , 2018, Nature Catalysis.

[32]  A. Zunger,et al.  Functionality-Directed Screening of Pb-Free Hybrid Organic–Inorganic Perovskites with Desired Intrinsic Photovoltaic Functionalities , 2016, 1611.08032.

[33]  Tianshu Li,et al.  High‐throughput computational materials screening and discovery of optoelectronic semiconductors , 2020, WIREs Computational Molecular Science.

[34]  John C. Slater,et al.  Atomic Radii in Crystals , 1964 .

[35]  Christopher J. Bartel,et al.  New tolerance factor to predict the stability of perovskite oxides and halides , 2018, Science Advances.

[36]  Stefano de Gironcoli,et al.  QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials , 2009, Journal of physics. Condensed matter : an Institute of Physics journal.

[37]  Alán Aspuru-Guzik,et al.  The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid , 2011 .

[38]  G. R. Schleder,et al.  From DFT to machine learning: recent approaches to materials science–a review , 2019, Journal of Physics: Materials.

[39]  YunKyong Hyon,et al.  Identifying Pb-free perovskites for solar cells by machine learning , 2019, npj Computational Materials.

[40]  D. G. Tuck,et al.  The crystal structure of indium diiodide, indium(I) tetraiodoindate(III), In[InI4] , 1985 .

[41]  Alex Zunger,et al.  Genetic design of enhanced valley splitting towards a spin qubit in silicon , 2013, Nature Communications.

[42]  W. M. Haynes CRC Handbook of Chemistry and Physics , 1990 .

[43]  David C. Lonie,et al.  XtalOpt: An open-source evolutionary algorithm for crystal structure prediction , 2011, Comput. Phys. Commun..

[44]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[45]  Su-Huai Wei,et al.  Design of Lead-Free Inorganic Halide Perovskites for Solar Cells via Cation-Transmutation. , 2017, Journal of the American Chemical Society.

[46]  Wei Chen,et al.  FireWorks: a dynamic workflow system designed for high‐throughput applications , 2015, Concurr. Comput. Pract. Exp..

[47]  Wenhao Zhu,et al.  SEHC: A high-throughput materials computing framework with automatic self-evaluation filtering , 2020 .

[48]  Steven L. Brunton,et al.  Data-driven discovery of partial differential equations , 2016, Science Advances.

[49]  Marco Buongiorno Nardelli,et al.  AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations , 2012 .

[50]  Matthew Horton,et al.  Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows , 2017 .

[51]  Xiaoyu Yang,et al.  MatCloud: A high-throughput computational infrastructure for integrated management of materials simulation, data and resources , 2018 .

[52]  M. Scheffler,et al.  Big Data-Driven Materials Science and Its FAIR Data Infrastructure , 2019, Handbook of Materials Modeling.

[53]  Claudia Draxl,et al.  The NOMAD laboratory: from data sharing to artificial intelligence , 2019, Journal of Physics: Materials.

[54]  Liping Yu,et al.  Cu-In Halide Perovskite Solar Absorbers. , 2016, Journal of the American Chemical Society.

[55]  Olga Kononova,et al.  Unsupervised word embeddings capture latent knowledge from materials science literature , 2019, Nature.

[56]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[57]  Li Zhu,et al.  CALYPSO: A method for crystal structure prediction , 2012, Comput. Phys. Commun..

[58]  Jian Zhou,et al.  ALKEMIE: An intelligent computational platform for accelerating materials discovery and design , 2021 .

[59]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[60]  An Chen,et al.  A Machine Learning Model on Simple Features for CO2 Reduction Electrocatalysts , 2020 .

[61]  A. Choudhary,et al.  Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science , 2016 .

[62]  Chiho Kim,et al.  Machine learning in materials informatics: recent applications and prospects , 2017, npj Computational Materials.