Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design

One of the main challenges in materials discovery is efficiently exploring the vast search space for targeted properties as approaches that rely on trial-and-error are impractical. We review how methods from the information sciences enable us to accelerate the search and discovery of new materials. In particular, active learning allows us to effectively navigate the search space iteratively to identify promising candidates for guiding experiments and computations. The approach relies on the use of uncertainties and making predictions from a surrogate model together with a utility function that prioritizes the decision making process on unexplored data. We discuss several utility functions and demonstrate their use in materials science applications, impacting both experimental and computational research. We summarize by indicating generalizations to multiple properties and multifidelity data, and identify challenges, future directions and opportunities in the emerging field of materials informatics.

[1]  Wolfgang Ponweiser,et al.  On Expected-Improvement Criteria for Model-based Multi-objective Optimization , 2010, PPSN.

[2]  Byung-Jun Yoon,et al.  Efficient experimental design for uncertainty reduction in gene regulatory networks , 2015, BMC Bioinformatics.

[3]  Robert F Murphy,et al.  An active role for machine learning in drug development. , 2011, Nature chemical biology.

[4]  Thomas J. Santner,et al.  Multiobjective optimization of expensive-to-evaluate deterministic computer simulator models , 2016, Comput. Stat. Data Anal..

[5]  S. Curtarolo,et al.  Accelerated discovery of new magnets in the Heusler alloy family , 2017, Science Advances.

[6]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[7]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[8]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[9]  Andy J. Keane,et al.  Statistical Improvement Criteria for Use in Multiobjective Design Optimization , 2006 .

[10]  Victor Picheny,et al.  Quantile-Based Optimization of Noisy Computer Experiments With Tunable Precision , 2013, Technometrics.

[11]  Yukinori Koyama,et al.  Accelerated discovery of cathode materials with prolonged cycle life for lithium-ion battery , 2014, Nature Communications.

[12]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[13]  Edward R. Dougherty,et al.  Quantifying the Objective Cost of Uncertainty in Complex Dynamical Systems , 2013, IEEE Transactions on Signal Processing.

[14]  Jean Kim,et al.  Apatite - An Adaptive Framework Structure , 2005 .

[15]  Kipton Barros,et al.  Automatized convergence of optoelectronic simulations using active machine learning , 2017 .

[16]  Turab Lookman,et al.  Learning from data to design functional materials without inversion symmetry , 2017, Nature Communications.

[17]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[18]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[19]  James E. Gubernatis,et al.  Multi-fidelity machine learning models for accurate bandgap predictions of solids , 2017 .

[20]  Junichiro Shiomi,et al.  Multifunctional structural design of graphene thermoelectrics by Bayesian optimization , 2018, Science Advances.

[21]  Turab Lookman,et al.  Experimental search for high-temperature ferroelectric perovskites guided by two-step machine learning , 2018, Nature Communications.

[22]  Ying Liu,et al.  Active Learning with Support Vector Machine Applied to Gene Expression Data for Cancer Classification , 2004, J. Chem. Inf. Model..

[23]  Monika Kwiatkowska,et al.  Combinatorial synthesis of alloy libraries with a progressive composition gradient using laser engineered net shaping (LENS): Hydrogen storage alloys , 2013 .

[24]  Koji Tsuda,et al.  COMBO: An efficient Bayesian optimization library for materials science , 2016 .

[25]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[26]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[27]  Guilherme Ottoni,et al.  Constrained Bayesian Optimization with Noisy Experiments , 2017, Bayesian Analysis.

[28]  A. O'Hagan,et al.  Curve Fitting and Optimal Design for Prediction , 1978 .

[29]  Jonas Mockus,et al.  Application of Bayesian approach to numerical methods of global and stochastic optimization , 1994, J. Glob. Optim..

[30]  Xun Huan,et al.  Simulation-based optimal Bayesian experimental design for nonlinear systems , 2011, J. Comput. Phys..

[31]  Xiaoning Qian,et al.  Accelerated search for BaTiO3-based piezoelectrics with vertical morphotropic phase boundary using Bayesian learning , 2016, Proceedings of the National Academy of Sciences.

[32]  Christopher Wolverton,et al.  Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments , 2018, Science Advances.

[33]  Atsuto Seko,et al.  Prediction of Low-Thermal-Conductivity Compounds with First-Principles Anharmonic Lattice-Dynamics Calculations and Bayesian Optimization. , 2015, Physical review letters.

[34]  A. McCallum,et al.  Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning , 2017 .

[35]  Trevor J. Hastie,et al.  Confidence intervals for random forests: the jackknife and the infinitesimal jackknife , 2013, J. Mach. Learn. Res..

[36]  Thomas J. Santner,et al.  Design and analysis of computer experiments , 1998 .

[37]  D. Kumar OPTIMIZATION METHODS , 2007 .

[38]  Matthias Poloczek,et al.  Efficient search of compositional space for hybrid organic–inorganic perovskites via Bayesian optimization , 2018, npj Computational Materials.

[39]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[40]  Christopher M Wolverton,et al.  Atomistic calculations and materials informatics: A review , 2017 .

[41]  Bryce Meredig,et al.  Data mining our way to the next generation of thermoelectrics , 2016 .

[42]  Edward R. Dougherty,et al.  Optimal Experimental Design for Gene Regulatory Networks in the Presence of Uncertainty , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[43]  Alexander I. J. Forrester,et al.  Multi-fidelity optimization via surrogate modelling , 2007, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[44]  Warren B. Powell,et al.  Optimal Learning: Powell/Optimal , 2012 .

[45]  Arun Mannodi-Kanakkithodi,et al.  Machine Learning Strategy for Accelerated Design of Polymer Dielectrics , 2016, Scientific Reports.

[46]  Andy J. Keane,et al.  Engineering Design via Surrogate Modelling - A Practical Guide , 2008 .

[47]  James Theiler,et al.  Importance of Feature Selection in Machine Learning and Adaptive Design for Materials , 2018 .

[48]  Turab Lookman,et al.  Multi-objective Optimization for Materials Discovery via Adaptive Design , 2018, Scientific Reports.

[49]  Julia Ling,et al.  High-Dimensional Materials and Process Optimization Using Data-Driven Experimental Design with Well-Calibrated Uncertainty Estimates , 2017, Integrating Materials and Manufacturing Innovation.

[50]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[51]  Oliver Stegle,et al.  Predicting and understanding the stability of G-quadruplexes , 2009, Bioinform..

[52]  G. Box,et al.  On the Experimental Attainment of Optimum Conditions , 1951 .

[53]  David M. Steinberg,et al.  Comparison of designs for computer experiments , 2006 .

[54]  Surya R. Kalidindi,et al.  Materials Data Science: Current Status and Future Outlook , 2015 .

[55]  Aaron Sloman,et al.  Parallel Problem Solving from Nature – PPSN XVI , 2000 .

[56]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[57]  D. Lindley,et al.  Bayes Estimates for the Linear Model , 1972 .

[58]  Jaime G. Carbonell,et al.  Active learning for human protein-protein interaction prediction , 2010, BMC Bioinformatics.

[59]  Wei Cai,et al.  Discovering variable fractional orders of advection-dispersion equations from field data using multi-fidelity Bayesian optimization , 2017, J. Comput. Phys..

[60]  R. N. Kackar Off-Line Quality Control, Parameter Design, and the Taguchi Method , 1985 .

[61]  B. Meredig,et al.  Materials science with large-scale data and informatics: Unlocking new opportunities , 2016 .

[62]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[63]  Koji Tsuda,et al.  Acceleration of stable interface structure searching using a kriging approach , 2016 .

[64]  Muratahan Aykol,et al.  Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD) , 2013 .

[65]  Iftekhar A. Karimi,et al.  Design of computer experiments: A review , 2017, Comput. Chem. Eng..

[66]  J. Hogden,et al.  Statistical inference and adaptive design for materials discovery , 2017 .

[67]  W. Näther Optimum experimental designs , 1994 .

[68]  Warren B. Powell,et al.  The Correlated Knowledge Gradient for Simulation Optimization of Continuous Parameters using Gaussian Process Regression , 2011, SIAM J. Optim..

[69]  Atsuto Seko,et al.  Representation of compounds for machine-learning prediction of physical properties , 2016, 1611.08645.

[70]  Shie Mannor,et al.  Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[71]  Jun Sun,et al.  An informatics approach to transformation temperatures of NiTi-based shape memory alloys , 2017 .

[72]  Warren B. Powell,et al.  Nested-Batch-Mode Learning and Stochastic Optimization with An Application to Sequential MultiStage Testing in Materials Science , 2015, SIAM J. Sci. Comput..

[73]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[74]  James Theiler,et al.  Selecting the selector: Comparison of update rules for discrete global optimization , 2017, Stat. Anal. Data Min..

[75]  Ichiro Takeuchi,et al.  Fulfilling the promise of the materials genome initiative with high-throughput experimental methodologies , 2017 .

[76]  L. Brinson,et al.  Identifying interphase properties in polymer nanocomposites using adaptive optimization , 2018, Composites Science and Technology.

[77]  Krishna Rajan,et al.  Materials Informatics: The Materials ``Gene'' and Big Data , 2015 .

[78]  A. O'Hagan,et al.  Predicting the output from a complex computer code when fast approximations are available , 2000 .

[79]  Merlise A. Clyde,et al.  Experimental Design: A Bayesian Perspective , 2001 .

[80]  Ghanshyam Pilania,et al.  Rational design of all organic polymer dielectrics , 2014, Nature Communications.

[81]  Marco Buongiorno Nardelli,et al.  AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations , 2012 .

[82]  Warren B. Powell,et al.  The Knowledge-Gradient Policy for Correlated Normal Beliefs , 2009, INFORMS J. Comput..

[83]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[84]  Kipton Barros,et al.  Optimisation of GaN LEDs and the reduction of efficiency droop using active machine learning , 2016, Scientific Reports.

[85]  Adrian E. Roitberg,et al.  Less is more: sampling chemical space with active learning , 2018, The Journal of chemical physics.

[86]  T. Lookman,et al.  Accelerated Discovery of Large Electrostrains in BaTiO3‐Based Piezoelectrics Using Active Learning , 2018, Advanced materials.

[87]  J. Piprek Nitride semiconductor devices : principles and simulation , 2007 .

[88]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[89]  S. Belyaev,et al.  Martensitic transformation and mechanical behavior of porous Ti-50.0 at % Ni alloy, fabricated by self-propagating high temperature synthesis at different temperature , 2010 .

[90]  Alok Choudhary,et al.  A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials , 2016 .

[91]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[92]  Jye-Chyi Lu,et al.  Optimization of a Carbon Dioxide-Assisted Nanoparticle Deposition Process Using Sequential Experimental Design with Adaptive Design Space , 2012 .

[93]  R. D. Shannon Revised effective ionic radii and systematic studies of interatomic distances in halides and chalcogenides , 1976 .

[94]  Y. Marzouk,et al.  Information-Driven Experimental Design in Materials Science , 2016 .

[95]  Emma Strubell,et al.  Machine-learned and codified synthesis parameters of oxide materials , 2017, Scientific Data.

[96]  Chiho Kim,et al.  Machine learning in materials informatics: recent applications and prospects , 2017, npj Computational Materials.

[97]  James Theiler,et al.  Adaptive Strategies for Materials Design using Uncertainties , 2016, Scientific Reports.

[98]  James Theiler,et al.  Accelerated search for materials with targeted properties by adaptive design , 2016, Nature Communications.

[99]  Edward R. Dougherty,et al.  Optimal experimental design for materials discovery , 2017 .

[100]  Wei Chen,et al.  A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of k-nary Inorganic Polycrystalline Compounds , 2016, Scientific Reports.

[101]  I Takeuchi,et al.  High-throughput determination of structural phase diagram and constituent phases using GRENDEL , 2015, Nanotechnology.

[102]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[103]  Yu Hen Hu,et al.  On-line learning for active pattern recognition , 1996 .

[104]  H. Koinuma,et al.  Combinatorial solid-state chemistry of inorganic materials , 2004, Nature materials.

[105]  James Theiler,et al.  Predicting displacements of octahedral cations in ferroelectric perovskites using machine learning. , 2017, Acta crystallographica Section B, Structural science, crystal engineering and materials.

[106]  Stefan M. Wild,et al.  Derivative-free optimization for parameter estimation in computational nuclear physics , 2014, 1406.5464.

[107]  Alexander V. Shapeev,et al.  Active learning of linearly parametrized interatomic potentials , 2016, 1611.09346.

[108]  Manh Cuong Nguyen,et al.  On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets , 2014, Scientific Reports.

[109]  Atsuto Seko,et al.  Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single- and binary-component solids , 2013, 1310.1546.

[110]  Warren B. Powell,et al.  C Xxxx Society for Industrial and Applied Mathematics Optimal Learning in Experimental Design Using the Knowledge Gradient Policy with Application to Characterizing Nanoemulsion Stability , 2022 .