The Materials Simulation Toolkit for Machine learning (MAST-ML): An automated open source toolkit to accelerate data-driven materials research

Abstract As data science and machine learning methods are taking on an increasingly important role in the materials research community, there is a need for the development of machine learning software tools that are easy to use (even for nonexperts with no programming ability), provide flexible access to the most important algorithms, and codify best practices of machine learning model development and evaluation. Here, we introduce the Materials Simulation Toolkit for Machine Learning (MAST-ML), an open source Python-based software package designed to broaden and accelerate the use of machine learning in materials science research. MAST-ML provides predefined routines for many input setup, model fitting, and post-analysis tasks, as well as a simple structure for executing a multi-step machine learning model workflow. In this paper, we describe how MAST-ML is used to streamline and accelerate the execution of machine learning problems. We walk through how to acquire and run MAST-ML, demonstrate how to execute different components of a supervised machine learning workflow via a customized input file, and showcase a number of features and analyses conducted automatically during a MAST-ML run. Further, we demonstrate the utility of MAST-ML by showcasing examples of recent materials informatics studies which used MAST-ML to formulate and evaluate various machine learning models for an array of materials applications. Finally, we lay out a vision of how MAST-ML, together with complementary software packages and emerging cyberinfrastructure, can advance the rapidly growing field of materials informatics, with a focus on producing machine learning models easily, reproducibly, and in a manner that facilitates model evolution and improvement in the future.

[1]  G. Pilania,et al.  Machine learning bandgaps of double perovskites , 2016, Scientific Reports.

[2]  A. Choudhary,et al.  Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science , 2016 .

[3]  I. Foster,et al.  The Materials Data Facility: Data Services to Advance Materials Science Research , 2016, JOM.

[4]  HastieTrevor,et al.  Confidence intervals for random forests , 2014 .

[5]  D. Morgan,et al.  Exploring effective charge in electromigration using machine learning , 2019, MRS Communications.

[6]  Nicholas Lubbers,et al.  Inferring low-dimensional microstructure representations using convolutional neural networks , 2016, Physical review. E.

[7]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[8]  Wei-keng Liao,et al.  ElemNet: Deep Learning the Chemistry of Materials From Only Elemental Composition , 2018, Scientific Reports.

[9]  Julia Ling,et al.  High-Dimensional Materials and Process Optimization Using Data-Driven Experimental Design with Well-Calibrated Uncertainty Estimates , 2017, Integrating Materials and Manufacturing Innovation.

[10]  D. Dimiduk,et al.  Perspectives on the Impact of Machine Learning, Deep Learning, and Artificial Intelligence on Materials, Processes, and Structures Engineering , 2018, Integrating Materials and Manufacturing Innovation.

[11]  Chiho Kim,et al.  Machine learning in materials informatics: recent applications and prospects , 2017, npj Computational Materials.

[12]  Charles H. Ward Materials Genome Initiative for Global Competitiveness , 2012 .

[13]  Tim Mueller,et al.  Machine Learning in Materials Science , 2016 .

[14]  Ian Foster,et al.  Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery , 2018 .

[15]  Christopher Wolverton,et al.  Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments , 2018, Science Advances.

[16]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[17]  Cormac Toher,et al.  AFLOW-ML: A RESTful API for machine-learning predictions of materials properties , 2017, Computational Materials Science.

[18]  Wei Li,et al.  Predicting the thermodynamic stability of perovskite oxides using machine learning models , 2018, Computational Materials Science.

[19]  James E. Gubernatis,et al.  Multi-fidelity machine learning models for accurate bandgap predictions of solids , 2017 .

[20]  Atsuto Seko,et al.  Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single- and binary-component solids , 2013, 1310.1546.

[21]  Alok Choudhary,et al.  A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials , 2016 .

[22]  Svetlozar Nestorov,et al.  The Computational Materials Repository , 2012, Computing in Science & Engineering.

[23]  Ichiro Takeuchi,et al.  Unsupervised phase mapping of X-ray diffraction data by nonnegative matrix factorization integrated with custom clustering , 2018, npj Computational Materials.

[24]  Krishna Rajan,et al.  New frontiers for the materials genome initiative , 2019, npj Computational Materials.

[25]  Shyue Ping Ong,et al.  Deep neural networks for accurate predictions of crystal stability , 2017, Nature Communications.

[26]  Muratahan Aykol,et al.  Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD) , 2013 .

[27]  Chiho Kim,et al.  Machine Learning Assisted Predictions of Intrinsic Dielectric Breakdown Strength of ABX3 Perovskites , 2016 .

[28]  Bryce Meredig,et al.  Robust FCC solute diffusion predictions from ab-initio machine learning methods , 2017, 1705.08798.

[29]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[30]  Corey Oses,et al.  Machine learning modeling of superconducting critical temperature , 2017, npj Computational Materials.

[31]  Jakoah Brgoch,et al.  Predicting the Band Gaps of Inorganic Solids by Machine Learning. , 2018, The journal of physical chemistry letters.

[32]  Bryce Meredig,et al.  Materials Data Infrastructure: A Case Study of the Citrination Platform to Examine Data Import, Storage, and Access , 2016 .

[33]  David B. Brough,et al.  Materials Knowledge Systems in Python—a Data Science Framework for Accelerated Development of Hierarchical Materials , 2017, Integrating Materials and Manufacturing Innovation.

[34]  John M. Gregoire,et al.  Perspective: Composition–structure–property mapping in high-throughput experiments: Turning data into knowledge , 2016 .

[35]  A. McCallum,et al.  Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning , 2017 .

[36]  Andrew L. Ferguson,et al.  Machine learning and data science in soft materials engineering , 2018, Journal of physics. Condensed matter : an Institute of Physics journal.

[37]  Trevor J. Hastie,et al.  Confidence intervals for random forests: the jackknife and the infinitesimal jackknife , 2013, J. Mach. Learn. Res..

[38]  S. Curtarolo,et al.  AFLOW: An automatic framework for high-throughput materials discovery , 2012, 1308.5715.

[39]  Brian L. DeCost,et al.  Exploring the microstructure manifold: Image texture representations applied to ultrahigh carbon steel microstructures , 2017, 1702.01117.

[40]  Wei Li,et al.  Automated defect analysis in electron microscopic images , 2018, npj Computational Materials.

[41]  Anubhav Jain,et al.  Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis , 2012 .

[42]  P. Ball Using artificial intelligence to accelerate materials development , 2019, MRS Bulletin.

[43]  Gerbrand Ceder,et al.  DATA MINING IN MATERIALS DEVELOPMENT , 2005 .

[44]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[45]  Steven R. Young,et al.  Data Mining for better material synthesis: the case of pulsed laser deposition of complex oxides , 2017, 1710.07721.

[46]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[47]  Adrian E. Roitberg,et al.  Less is more: sampling chemical space with active learning , 2018, The Journal of chemical physics.

[48]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[49]  R. Ramprasad,et al.  Machine Learning in Materials Science , 2016 .

[50]  Dane Morgan,et al.  Error assessment and optimal cross-validation approaches in machine learning applied to impurity diffusion , 2019, Computational Materials Science.

[51]  Christopher M Wolverton,et al.  Atomistic calculations and materials informatics: A review , 2017 .

[52]  Engineering,et al.  Prediction model of band gap for inorganic compounds by combination of density functional theory calculations and machine learning techniques , 2016 .

[53]  Kyle Chard,et al.  Matminer: An open source toolkit for materials data mining , 2018, Computational Materials Science.

[54]  B. Uberuaga,et al.  Physics-informed machine learning for inorganic scintillator discovery. , 2018, The Journal of chemical physics.

[55]  Turab Lookman,et al.  Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design , 2019, npj Computational Materials.

[56]  Arun Mannodi-Kanakkithodi,et al.  Machine Learning Strategy for Accelerated Design of Polymer Dielectrics , 2016, Scientific Reports.

[57]  Rahul Rao,et al.  Autonomy in materials research: a case study in carbon nanotube growth , 2016 .