An Automated Machine Learning architecture for the accelerated prediction of Metal-Organic Frameworks performance in energy and environmental applications

Abstract Due to their exceptional host-guest properties, Metal-Organic Frameworks (MOFs) are promising materials for storage of various gases with environmental and technological interest. Molecular modeling and simulations are invaluable tools, extensively used over the last two decades for the study of various properties of MOFs. In particular, Monte Carlo simulation techniques have been employed for the study of the gas uptake capacity of several MOFs at a wide range of different thermodynamic conditions. Despite the accurate predictions of molecular simulations, the accurate characterization and the high-throughput screening of the enormous number of MOFs that can be potentially synthesized by combining various structural building blocks is beyond present computer capabilities. In this work, we propose and demonstrate the use of an alternative approach, namely one based on an Automated Machine Learning (AutoML) architecture that is capable of training machine learning and statistical predictive models for MOFs’ chemical properties and estimate their predictive performance with confidence intervals. The architecture tries numerous combinations of different machine learning (ML) algorithms, tunes their hyper-parameters, and conservatively estimates performance of the final model. We demonstrate that it correctly estimates performance even with few samples ( https://app.jadbio.com/share/86477fd7-d467-464d-ac41-fcbb0475444b .

[1]  David Farrusseng,et al.  Metal-Organic Frameworks: Applications from Catalysis to Gas Storage , 2011 .

[2]  Giorgos Borboudakis,et al.  Chemically intuited, large-scale screening of MOFs by machine learning techniques , 2017, npj Computational Materials.

[3]  Maciej Haranczyk,et al.  In Silico Discovery of High Deliverable Capacity Metal–Organic Frameworks , 2015 .

[4]  Chongli Zhong,et al.  Revealing the structure-property relationships of metal-organic frameworks for CO2 capture from flue gas. , 2012, Langmuir : the ACS journal of surfaces and colloids.

[5]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[6]  Randall Q. Snurr,et al.  Large-Scale Quantitative Structure–Property Relationship (QSPR) Analysis of Methane Storage in Metal–Organic Frameworks , 2013 .

[7]  C. Wilmer,et al.  Large-scale screening of hypothetical metal-organic frameworks. , 2012, Nature chemistry.

[8]  Randall Q. Snurr,et al.  Evaluation of Force Field Performance for High-Throughput Screening of Gas Uptake in Metal–Organic Frameworks , 2015 .

[9]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[10]  Emil Pitkin,et al.  Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation , 2013, 1309.6392.

[11]  Maciej Haranczyk,et al.  In silico design of porous polymer networks: high-throughput screening for methane storage materials. , 2014, Journal of the American Chemical Society.

[12]  Abhoyjit S Bhown,et al.  In silico screening of carbon-capture materials. , 2012, Nature materials.

[13]  Perla B. Balbuena,et al.  Carbon dioxide capture-related gas adsorption and separation in metal-organic frameworks , 2011 .

[14]  Alessio Farcomeni,et al.  Feature Selection with the R Package MXM: Discovering Statistically-Equivalent Feature Subsets , 2016, 1611.03227.

[15]  Ranjan Srivastava,et al.  Machine Learning Using Combined Structural and Chemical Descriptors for Prediction of Methane Adsorption Performance of Metal Organic Frameworks (MOFs). , 2017, ACS combinatorial science.

[16]  Jeffrey R. Long,et al.  Evaluating metal–organic frameworks for natural gas storage , 2014 .

[17]  Diego A. Gómez-Gualdrón,et al.  Benchmark Study of Hydrogen Storage in Metal-Organic Frameworks under Temperature and Pressure Swing Conditions , 2018 .

[18]  Tom K Woo,et al.  Rapid and Accurate Machine Learning Recognition of High Performing Metal Organic Frameworks for CO2 Capture. , 2014, The journal of physical chemistry letters.

[19]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[20]  Paul R. Cohen,et al.  Multiple Comparisons in Induction Algorithms , 2000, Machine Learning.

[21]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[22]  Tom K. Woo,et al.  Robust Machine Learning Models for Predicting High CO2 Working Capacity and CO2/H2 Selectivity of Gas Adsorption in Metal Organic Frameworks for Precombustion Carbon Capture , 2019, The Journal of Physical Chemistry C.

[23]  Tom K. Woo,et al.  Atomic Property Weighted Radial Distribution Functions Descriptors of Metal–Organic Frameworks for the Prediction of Gas Uptake Capacity , 2013 .

[24]  Randal S. Olson,et al.  Automating Biomedical Data Science Through Tree-Based Pipeline Optimization , 2016, EvoApplications.

[25]  M. Allendorf,et al.  Metal‐Organic Frameworks: A Rapidly Growing Class of Versatile Nanoporous Materials , 2011, Advanced materials.

[26]  Peyman Z. Moghadam,et al.  Development of a Cambridge Structural Database Subset: A Collection of Metal-Organic Frameworks for Past, Present, and Future , 2017 .

[27]  Vincenzo Lagani,et al.  Performance-Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization , 2014, Int. J. Artif. Intell. Tools.

[28]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[29]  Donald J. Siegel,et al.  Balancing gravimetric and volumetric hydrogen density in MOFs , 2017 .

[30]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[31]  Cory M. Simon,et al.  High-Throughput Computational Screening of Multivariate Metal-Organic Frameworks (MTV-MOFs) for CO2 Capture. , 2017, The journal of physical chemistry letters.

[32]  Susumu Kitagawa,et al.  Chemistry of coordination space of porous coordination polymers , 2007 .

[33]  Tom K. Woo,et al.  Quantitative Structure–Property Relationship Models for Recognizing Metal Organic Frameworks (MOFs) with High CO2 Working Capacity and CO2/CH4 Selectivity for Methane Purification , 2016 .

[34]  Peter G. Boyd,et al.  Computational development of the nanoporous materials genome , 2017 .

[35]  Chongli Zhong,et al.  Exploring the structure-property relationships of covalent organic frameworks for noble gas separations , 2017 .

[36]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[37]  Giorgos Borboudakis,et al.  Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation , 2017, Machine Learning.

[38]  Lars Kotthoff,et al.  Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[39]  Siegmar Roth,et al.  Hydrogen adsorption in different carbon nanostructures , 2005 .

[40]  Maciej Haranczyk,et al.  Computation-Ready, Experimental Metal–Organic Frameworks: A Tool To Enable High-Throughput Screening of Nanoporous Crystals , 2014 .

[41]  Sergio Escalera,et al.  Design of the 2015 ChaLearn AutoML challenge , 2015, IJCNN.

[42]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[43]  Michael O’Keeffe,et al.  The Chemistry and Applications of Metal-Organic Frameworks , 2013, Science.

[44]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[45]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[46]  H. Ohno,et al.  Machine Learning Approach for Prediction and Search: Application to Methane Storage in a Metal–Organic Framework , 2016 .