Assessing the frontier: Active learning, model accuracy, and multi-objective candidate discovery and optimization.

Discovering novel chemicals and materials can be greatly accelerated by iterative machine learning-informed proposal of candidates-active learning. However, standard global error metrics for model quality are not predictive of discovery performance and can be misleading. We introduce the notion of Pareto shell error to help judge the suitability of a model for proposing candidates. Furthermore, through synthetic cases, an experimental thermoelectric dataset and a computational organic molecule dataset, we probe the relation between acquisition function fidelity and active learning performance. Results suggest novel diagnostic tools, as well as new insights for the acquisition function design.

[1]  Dimitris C. Lagoudas,et al.  Multi-objective Bayesian materials discovery: Application on the discovery of precipitation strengthened NiTi shape memory alloys through micromechanical modeling , 2018, Materials & Design.

[2]  Atsuto Seko,et al.  Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single- and binary-component solids , 2013, 1310.1546.

[3]  Taro Hitosugi,et al.  Rapid prediction of molecule arrangements on metal surfaces via Bayesian optimization , 2017 .

[4]  Zachary W. Ulissi,et al.  Active learning across intermetallics to guide discovery of electrocatalysts for CO2 reduction and H2 evolution , 2018, Nature Catalysis.

[5]  Takashi Miyake,et al.  Crystal structure prediction accelerated by Bayesian optimization , 2018 .

[6]  R. Ritchie The conflicts between strength and toughness. , 2011, Nature materials.

[7]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[8]  Koji Tsuda,et al.  COMBO: An efficient Bayesian optimization library for materials science , 2016 .

[9]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[10]  Jukka Corander,et al.  Bayesian inference of atomistic structure in functional materials , 2017, npj Computational Materials.

[11]  S. Weisberg Applied Linear Regression: Weisberg/Applied Linear Regression 3e , 2005 .

[12]  Junichiro Shiomi,et al.  Designing Nanostructures for Phonon Transport via Bayesian Optimization , 2016, 1609.04972.

[13]  Kyle Chard,et al.  Matminer: An open source toolkit for materials data mining , 2018, Computational Materials Science.

[14]  Atsuto Seko,et al.  Prediction of Low-Thermal-Conductivity Compounds with First-Principles Anharmonic Lattice-Dynamics Calculations and Bayesian Optimization. , 2015, Physical review letters.

[15]  Karsten W Jacobsen,et al.  Exploration versus Exploitation in Global Atomistic Structure Optimization. , 2018, The journal of physical chemistry. A.

[16]  Warren B. Powell,et al.  Nested-Batch-Mode Learning and Stochastic Optimization with An Application to Sequential MultiStage Testing in Materials Science , 2015, SIAM J. Sci. Comput..

[17]  Patrick Roocks,et al.  Computing Pareto Frontiers and Database Preferences with the rPref Package , 2016, R J..

[18]  Taylor D. Sparks,et al.  Data-Driven Review of Thermoelectric Materials: Performance and Resource Considerations , 2013 .

[19]  Andy J. Keane,et al.  Statistical Improvement Criteria for Use in Multiobjective Design Optimization , 2006 .

[20]  Karsten Wedel Jacobsen,et al.  Local Bayesian optimizer for atomic structures , 2018, Physical Review B.

[21]  Julia Ling,et al.  High-Dimensional Materials and Process Optimization Using Data-Driven Experimental Design with Well-Calibrated Uncertainty Estimates , 2017, Integrating Materials and Manufacturing Innovation.

[22]  Alok Choudhary,et al.  A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials , 2016 .

[23]  I. Takeuchi,et al.  Data‐Driven Materials Exploration for Li‐Ion Conductive Ceramics by Exhaustive and Informatics‐Aided Computations , 2018, The Chemical Record.

[24]  Jun Sun,et al.  An informatics approach to transformation temperatures of NiTi-based shape memory alloys , 2017 .

[25]  Gus L. W. Hart,et al.  Accelerating high-throughput searches for new alloys with active learning of interatomic potentials , 2018, Computational Materials Science.

[26]  Jeremy E. Oakley,et al.  Multivariate Gaussian Process Emulators With Nonseparable Covariance Structures , 2013, Technometrics.

[27]  Christoph J. Brabec,et al.  Design Rules for Donors in Bulk‐Heterojunction Solar Cells—Towards 10 % Energy‐Conversion Efficiency , 2006 .

[28]  Thomas J. Santner,et al.  Multiobjective optimization of expensive-to-evaluate deterministic computer simulator models , 2016, Comput. Stat. Data Anal..