Improving Data and Prediction Quality of High-Throughput Perovskite Synthesis with Model Fusion

Combinatorial fusion analysis (CFA) is an approach for combining multiple scoring systems using the rank-score characteristic function and cognitive diversity measure. One example is to combine diverse machine learning models to achieve better prediction quality. In this work, we apply CFA to the synthesis of metal halide perovskites containing organic ammonium cations via inverse temperature crystallization. Using a data set generated by high-throughput experimentation, four individual models (support vector machines, random forests, weighted logistic classifier, and gradient boosted trees) were developed. We characterize each of these scoring systems and explore 66 possible combinations of the models. When measured by the precision on predicting crystal formation, the majority of the combination models improves the individual model results. The best combination models outperform the best individual models by 3.9 percentage points in precision. In addition to improving prediction quality, we demonstrate how the fusion models can be used to identify mislabeled input data and address issues of data quality. In particular, we identify example cases where all single models and all fusion models do not give the correct prediction. Experimental replication of these syntheses reveals that these compositions are sensitive to modest temperature variations across the different locations of the heating element that can hinder or enhance the crystallization process. In summary, we demonstrate that model fusion using CFA can not only identify a previously unconsidered influence on reaction outcome but also be used as a form of quality control for high-throughput experimentation.

[1]  C. Brabec,et al.  Robot-Based High-Throughput Screening of Antisolvents for Lead Halide Perovskites , 2020 .

[2]  A. Oliynyk,et al.  Finding the Next Superhard Material through Ensemble Learning , 2020, Advanced materials.

[3]  Ian M. Pendleton,et al.  Robot-Accelerated Perovskite Investigation and Discovery , 2020, Chemistry of Materials.

[4]  Zhi Li,et al.  Can Machines “Learn” Halide Perovskite Crystal Formation without Accurate Physicochemical Features? , 2020 .

[5]  E. Sargent,et al.  Machine-Learning-Accelerated Perovskite Crystallization , 2020, Matter.

[6]  Brian L. DeCost,et al.  Scientific AI in materials science: a path to a sustainable and scalable paradigm , 2020, Mach. Learn. Sci. Technol..

[7]  Nessa Carson Rise of the Robots. , 2020, Chemistry.

[8]  A. Aspuru-Guzik,et al.  Self-driving laboratory for accelerated discovery of thin-film materials , 2019, Science Advances.

[9]  Klavs F. Jensen,et al.  Autonomous discovery in the chemical sciences part I: Progress , 2020, Angewandte Chemie.

[10]  Sorelle A. Friedler,et al.  Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis , 2019, Nature.

[11]  Sorelle A. Friedler,et al.  Experiment Specification, Capture and Laboratory Automation Technology (ESCALATE): a software pipeline for automated chemical experimentation and data management , 2019, MRS Communications.

[12]  Jake Graser,et al.  Can machine learning find extraordinary materials? , 2019, Computational Materials Science.

[13]  Brian L. DeCost,et al.  Accelerated Development of Perovskite-Inspired Materials via High-Throughput Synthesis and Machine-Learning Diagnosis , 2018, Joule.

[14]  Barry P Rand,et al.  Perovskites for Next-Generation Optical Sources. , 2019, Chemical reviews.

[15]  T. Miyasaka,et al.  Halide Perovskite Photovoltaics: Background, Status, and Future Prospects. , 2019, Chemical reviews.

[16]  Huihuan Qian,et al.  AIR-Chem: Authentic Intelligent Robotics for Chemistry. , 2018, The journal of physical chemistry. A.

[17]  Richard M. Maceiczyk,et al.  Exploration of Near-Infrared-Emissive Colloidal Multinary Lead Halide Perovskite Nanocrystals Using an Automated Microfluidic Platform , 2018, ACS nano.

[18]  Christoph J. Brabec,et al.  Exploring the Stability of Novel Wide Bandgap Perovskites by a Robot Based High Throughput Approach , 2018 .

[19]  S. Liu,et al.  Recent Progress in Single‐Crystalline Perovskite Research Including Crystal Preparation, Property Evaluation, and Applications , 2017, Advanced science.

[20]  Ohid Yaqub,et al.  Serendipity: Towards a Taxonomy and a Theory , 2016 .

[21]  Elias Bareinboim,et al.  Causal inference and the data-fusion problem , 2016, Proceedings of the National Academy of Sciences.

[22]  Paul Raccuglia,et al.  Machine-learning-assisted materials discovery using failed experiments , 2016, Nature.

[23]  D. Mitzi,et al.  Inorganic Perovskites : Structural Versatility for Functional Materials Design , 2016 .

[24]  J. Büchs,et al.  Enzyme activity deviates due to spatial and temporal temperature profiles in commercial microtiter plate readers. , 2016, Biotechnology journal.

[25]  D. Frank Hsu,et al.  On the combination of two visual cognition systems using combinatorial fusion , 2015, Brain Informatics.

[26]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[27]  Shengli Wu,et al.  Data Fusion in Information Retrieval , 2012, Adaptation, Learning, and Optimization.

[28]  J. Gibbons,et al.  Nonparametric Statistical Inference , 2020, International Encyclopedia of Statistical Science.

[29]  D. Frank Hsu,et al.  Rank-Score Characteristics (RSC) Function and Cognitive Diversity , 2010, Brain Informatics.

[30]  Damian M. Lyons,et al.  Combining multiple scoring systems for target tracking using rank-score characteristics , 2009, Inf. Fusion.

[31]  Peter Willett,et al.  Analysis of Data Fusion Methods in Virtual Screening: Similarity and Group Fusion , 2006, J. Chem. Inf. Model..

[32]  D. Frank Hsu,et al.  Combinatorial Fusion Analysis: Methods and Practices of Combining Multiple Scoring Systems , 2006 .

[33]  D. Frank Hsu,et al.  Consensus Scoring Criteria for Improving Enrichment in Virtual Screening , 2005, J. Chem. Inf. Model..

[34]  D. Frank Hsu,et al.  Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval , 2005, Information Retrieval.

[35]  Shaomeng Wang,et al.  How Does Consensus Scoring Work for Virtual Library Screening? An Idealized Computer Experiment , 2001, J. Chem. Inf. Comput. Sci..

[36]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2007 .

[37]  D. Obradovic,et al.  Combining Artificial Neural Nets , 1999, Perspectives in Neural Computing.

[38]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[39]  Joseph S. Verducci,et al.  Probability Models and Statistical Analyses for Ranking Data , 1992 .

[40]  L. Kricka,et al.  Thermal characteristics of microtitre plates used in immunological assays. , 1979, Journal of immunological methods.