Open Challenges in Developing Generalizable Large-Scale Machine-Learning Models for Catalyst Discovery

The development of machine learned potentials for catalyst discovery has predominantly been focused on very specific chemistries and material compositions. While effective in inter-polating between available materials, these approaches struggle to generalize across chemical space. The recent curation of large-scale catalyst datasets has offered the opportunity to build a universal machine learning potential, spanning chemical and composition space. If accomplished, said potential could accelerate the catalyst discovery process across a variety of applications (CO 2 reduction, NH 3 production, etc.) without additional specialized training ef-forts that are currently required. The release of the Open Catalyst 2020 Dataset (OC20) 1 has begun just that, pushing the heterogeneous catalysis and machine learning communities towards building more accurate and robust models. In this perspective, we discuss some of the challenges and findings of recent develop-ments on OC20. We examine the performance of current models across different materials and adsorbates to identify notably underperforming subsets. We then discuss some of the modeling efforts surrounding energy-conservation, approaches to finding and evaluating the local minima, and augmentation of off-equilibrium data. To complement the community’s ongo-ing developments, we end with an outlook to some of the important challenges that have yet to be thoroughly explored for large-scale catalyst discovery.

[1]  Zachary W. Ulissi,et al.  Transfer learning using attentions across atomic systems with graph neural networks (TAAG). , 2022, The Journal of chemical physics.

[2]  Simon L. Batzner,et al.  Learning local equivariant representations for large-scale atomistic dynamics , 2022, Nature Communications.

[3]  Zachary W. Ulissi,et al.  How Do Graph Networks Generalize to Large and Diverse Molecular Systems? , 2022, ArXiv.

[4]  Brandon M. Wood,et al.  Towards Training Billion Parameter Graph Neural Networks for Atomic Simulations , 2022, ICLR.

[5]  Andrew S. Rosen,et al.  Realizing the data-driven, computational discovery of metal-organic framework catalysts , 2021, Current Opinion in Chemical Engineering.

[6]  Ole Winther,et al.  Calibrated uncertainty for molecular property prediction using ensembles of message passing neural networks , 2021, Mach. Learn. Sci. Technol..

[7]  L. Ricardez‐Sandoval,et al.  Machine Learning in Solid Heterogeneous Catalysis: Recent Developments, Challenges and Perspectives , 2021, Chemical Engineering Science.

[8]  Alain C. Vaucher,et al.  Grand Challenges on Accelerating Discovery in Catalysis , 2021, Catalysis Today.

[9]  Haitao Huang,et al.  Unravelling the origin of bifunctional OER/ORR activity for single-atom catalysts supported on C2N by DFT and machine learning , 2021, Journal of Materials Chemistry A.

[10]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[11]  C. Lawrence Zitnick,et al.  Rotation Invariant Graph Neural Networks using Spin Convolutions , 2021, ArXiv.

[12]  P. Battaglia,et al.  Simple GNN Regularisation for 3D Molecular Property Prediction&Beyond , 2021, 2106.07971.

[13]  Di He,et al.  Do Transformers Really Perform Bad for Graph Representation? , 2021, ArXiv.

[14]  Nuha Y. Elamin,et al.  Adsorption Behavior of Congo Red onto Barium-Doped ZnO Nanoparticles: Correlation between Experimental Results and DFT Calculations. , 2021, Langmuir : the ACS journal of surfaces and colloids.

[15]  Florian Becker,et al.  GemNet: Universal Directional Graph Neural Networks for Molecules , 2021, NeurIPS.

[16]  P. Hu,et al.  Perspective on computational reaction prediction using machine learning methods in heterogeneous catalysis. , 2021, Physical chemistry chemical physics : PCCP.

[17]  J. Leskovec,et al.  ForceNet: A Graph Neural Network for Large-Scale Quantum Calculations , 2021, ArXiv.

[18]  Shuiwang Ji,et al.  Spherical Message Passing for 3D Graph Networks , 2021, ArXiv.

[19]  Michael Gastegger,et al.  Equivariant message passing for the prediction of tensorial properties and molecular spectra , 2021, ICML.

[20]  Chi Chen,et al.  AtomSets as a hierarchical transfer learning framework for small and large materials datasets , 2021, npj Computational Materials.

[21]  Andrew J. Medford,et al.  A Universal Framework for Featurization of Atomistic Systems , 2021, ArXiv.

[22]  Zachary W. Ulissi,et al.  Open Catalyst 2020 (OC20) Dataset and Community Challenges , 2020, ACS Catalysis.

[23]  Zachary W. Ulissi,et al.  Enabling robust offline active learning for machine learning potentials using simple physics-based priors , 2020, Mach. Learn. Sci. Technol..

[24]  Thomas F. Miller,et al.  UNiTE: Unitary N-body Tensor Equivariant Network with Applications to Quantum Chemistry , 2021, ArXiv.

[25]  Johannes T. Margraf,et al.  Fast and Uncertainty-Aware Directional Message Passing for Non-Equilibrium Molecules , 2020, ArXiv.

[26]  Zachary W. Ulissi,et al.  Differentiable Optimization for the Prediction of Ground State Structures (DOGSS). , 2020, Physical review letters.

[27]  An Chen,et al.  A Machine Learning Model on Simple Features for CO2 Reduction Electrocatalysts , 2020 .

[28]  O. Anatole von Lilienfeld,et al.  On the role of gradients for machine learning of molecular energies and forces , 2020, Mach. Learn. Sci. Technol..

[29]  Frederick R. Manby,et al.  OrbNet: Deep Learning for Quantum Chemistry Using Symmetry-Adapted Atomic-Orbital Features , 2020, The Journal of chemical physics.

[30]  Barbara Pernici,et al.  Evaluating Scalable Uncertainty Estimation Methods for Deep Learning-Based Molecular Property Prediction , 2020, J. Chem. Inf. Model..

[31]  Stephan Günnemann,et al.  Directional Message Passing for Molecular Graphs , 2020, ICLR.

[32]  William A. Goddard,et al.  Predicted Optimal Bifunctional Electrocatalysts for Both HER and OER Using Chalcogenide Heterostructures Based on Machine Learning Analysis of In Silico Quantum Mechanics Based High Throughput Screening. , 2020, The journal of physical chemistry letters.

[33]  Zachary W. Ulissi,et al.  Methods for comparing uncertainty quantifications for material property predictions , 2019, Mach. Learn. Sci. Technol..

[34]  Justin S. Smith,et al.  The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules , 2019, Scientific Data.

[35]  Simon L. Batzner,et al.  On-the-fly active learning of interpretable Bayesian force fields for atomistic rare events , 2019, npj Computational Materials.

[36]  Núria López,et al.  Statistical learning goes beyond the d-band model providing the thermochemistry of adsorbates on transition metals , 2019, Nature Communications.

[37]  Seoin Back,et al.  Toward a Design of Active Oxygen Evolution Catalysts: Insights from Automated Density Functional Theory Calculations and Machine Learning , 2019, ACS Catalysis.

[38]  P. Rinke,et al.  Data‐Driven Materials Science: Status, Challenges, and Perspectives , 2019, Advanced science.

[39]  Pengfei Chen,et al.  Alchemy: A Quantum Chemistry Dataset for Benchmarking AI Models , 2019, ArXiv.

[40]  Jacob R. Boes,et al.  High-throughput calculations of catalytic properties of bimetallic alloy surfaces , 2019, Scientific Data.

[41]  Seoin Back,et al.  Convolutional Neural Network of Atomic Surface structures to Predict Binding Energies For High-throughput Screening of Catalysts. , 2019, The journal of physical chemistry letters.

[42]  Ming Li,et al.  Dissolution and degradation of nuclear grade cationic exchange resin by Fenton oxidation combining experimental results and DFT calculations , 2019, Chemical Engineering Journal.

[43]  Georg Kresse,et al.  Phase Transitions of Hybrid Perovskites Simulated by Machine-Learning Force Fields Trained on the Fly with Bayesian Inference. , 2019, Physical review letters.

[44]  M. Scheffler,et al.  Beyond Scaling Relations for the Description of Catalytic Materials , 2019, ACS Catalysis.

[45]  Thomas Bligaard,et al.  Low-Scaling Algorithm for Nudged Elastic Band Calculations Using a Surrogate Machine Learning Model. , 2018, Physical review letters.

[46]  C. Bannwarth,et al.  GFN2-xTB-An Accurate and Broadly Parametrized Self-Consistent Tight-Binding Quantum Chemical Method with Multipole Electrostatics and Density-Dependent Dispersion Contributions. , 2018, Journal of chemical theory and computation.

[47]  Karsten Wedel Jacobsen,et al.  Local Bayesian optimizer for atomic structures , 2018, Physical Review B.

[48]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[49]  Zachary W. Ulissi,et al.  Active learning across intermetallics to guide discovery of electrocatalysts for CO2 reduction and H2 evolution , 2018, Nature Catalysis.

[50]  M. Chiarini,et al.  Synthesis of 2-Acylindoles via Ag- and Cu-Catalyzed anti-Michael Hydroamination of β-(2-Aminophenyl)-α,β-ynones: Experimental Results and DFT Calculations. , 2018, The Journal of organic chemistry.

[51]  Wenqing Zhang,et al.  Adsorption-energy-based activity descriptors for electrocatalysts in energy storage applications , 2018 .

[52]  Jaehoon Kim,et al.  Active learning with non-ab initio input features toward efficient CO2 reduction catalysts , 2018, Chemical science.

[53]  Christopher Wolverton,et al.  Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments , 2018, Science Advances.

[54]  Luke E K Achenie,et al.  High-throughput screening of bimetallic catalysts enabled by machine learning , 2017 .

[55]  Klaus-Robert Müller,et al.  SchNet: A continuous-filter convolutional neural network for modeling quantum interactions , 2017, NIPS.

[56]  Xianfeng Ma,et al.  Orbitalwise Coordination Number for Predicting Adsorption Properties of Metal Nanocatalysts. , 2017, Physical review letters.

[57]  Klaus-Robert Müller,et al.  Machine learning of accurate energy-conserving molecular force fields , 2016, Science Advances.

[58]  Andrew A Peterson,et al.  Acceleration of saddle-point searches with machine learning. , 2016, The Journal of chemical physics.

[59]  A. Choudhary,et al.  Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science , 2016 .

[60]  Muratahan Aykol,et al.  The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies , 2015 .

[61]  Min Ruan,et al.  Design of a polypyrrole MIP-SAW sensor for selective detection of flumequine in aqueous media. Correlation between experimental results and DFT calculations , 2015 .

[62]  Gábor Csányi,et al.  Gaussian approximation potentials: A brief tutorial introduction , 2015, 1502.01366.

[63]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[64]  Jun Chen,et al.  A global potential energy surface for the H2 + OH ↔ H2O + H reaction using neural networks. , 2013, The Journal of chemical physics.

[65]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[66]  G. Schatz The journal of physical chemistry letters , 2009 .

[67]  Ture R. Munter,et al.  Scaling properties of adsorption energies for hydrogen-containing molecules on transition-metal surfaces. , 2007, Physical review letters.

[68]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[69]  Yoshinobu Baba,et al.  Structure-property correlation of CdSe clusters using experimental results and first-principles DFT calculations. , 2006, Journal of the American Chemical Society.

[70]  A. Gross,et al.  Representing high-dimensional potential-energy surfaces for reactions at surfaces by neural networks , 2004 .

[71]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[72]  M. Robert,et al.  Radical anions of carbenes and carbene homologues. DFT study and preliminary experimental results , 2001 .

[73]  O. Isayev,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force fi eld computational cost † , 2017 .