Towards out-of-distribution generalizable predictions of chemical kinetics properties

Machine Learning (ML) techniques have found applications in estimating chemical kinetic properties. With the accumulated drug molecules identified through"AI4drug discovery", the next imperative lies in AI-driven design for high-throughput chemical synthesis processes, with the estimation of properties of unseen reactions with unexplored molecules. To this end, the existing ML approaches for kinetics property prediction are required to be Out-Of-Distribution (OOD) generalizable. In this paper, we categorize the OOD kinetic property prediction into three levels (structure, condition, and mechanism), revealing unique aspects of such problems. Under this framework, we create comprehensive datasets to benchmark (1) the state-of-the-art ML approaches for reaction prediction in the OOD setting and (2) the state-of-the-art graph OOD methods in kinetics property prediction problems. Our results demonstrated the challenges and opportunities in OOD kinetics property prediction. Our datasets and benchmarks can further support research in this direction.

[1]  Marwin H. S. Segler,et al.  Artificial intelligence for natural product drug discovery , 2023, Nature Reviews Drug Discovery.

[2]  Connor W. Coley,et al.  Scientific discovery in the age of artificial intelligence , 2023, Nature.

[3]  Connor W. Coley,et al.  Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems , 2023, ArXiv.

[4]  O. Isayev,et al.  Comprehensive exploration of graphically defined reaction spaces , 2023, Scientific Data.

[5]  Sunghwan Choi Prediction of transition state structures of gas-phase chemical reactions via machine learning , 2023, Nature Communications.

[6]  I. Larrosa,et al.  Organic reaction mechanism classification using machine learning , 2023, Nature.

[7]  Connor W. Coley,et al.  Reaction profiles for quantum chemistry-computed [3 + 2] cycloaddition reactions , 2022, Scientific Data.

[8]  Colin A. Grambow,et al.  RMG Database for Chemical Property Prediction , 2022, J. Chem. Inf. Model..

[9]  W. Green,et al.  High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions , 2022, Scientific data.

[10]  Shuiwang Ji,et al.  GOOD: A Graph Out-of-Distribution Benchmark , 2022, NeurIPS.

[11]  F. Glorius,et al.  Machine Learning for Chemical Reactivity The Importance of Failed Experiments. , 2022, Angewandte Chemie.

[12]  Yatao Bian,et al.  Learning Causally Invariant Representations for Out-of-Distribution Generalization on Graphs , 2022, NeurIPS.

[13]  Junchi Yan,et al.  Handling Distribution Shifts on Graphs: An Invariance Perspective , 2022, ICLR.

[14]  Pan Li,et al.  Interpretable and Generalizable Graph Learning via Stochastic Attention Mechanism , 2022, ICML.

[15]  Xiangnan He,et al.  Discovering Invariant Rationales for Graph Neural Networks , 2022, ICLR.

[16]  Junzhou Huang,et al.  DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery - A Focus on Affinity Prediction Problems with Noise Annotations , 2022, ArXiv.

[17]  Peng Cui,et al.  Towards Out-Of-Distribution Generalization: A Survey , 2021, ArXiv.

[18]  William H. Green,et al.  Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction , 2021, J. Chem. Inf. Model..

[19]  Bryan Perozzi,et al.  Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training data , 2021, NeurIPS.

[20]  Connor W. Coley,et al.  Quantum chemistry-augmented neural networks for reactivity prediction: Performance, generalizability, and explainability. , 2021, The Journal of chemical physics.

[21]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[22]  M. Meuwly Machine Learning for Chemical Reactions. , 2021, Chemical reviews.

[23]  P. Schwaller,et al.  Extraction of organic chemistry grammar from unsupervised learning of chemical reactions , 2021, Science Advances.

[24]  Bruno Ribeiro,et al.  Size-Invariant Graph Representations for Graph Classification Extrapolations , 2021, ICML.

[25]  Pang Wei Koh,et al.  WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.

[26]  Jan G. Rittig,et al.  Summit: Benchmarking Machine Learning Methods for Reaction Optimisation , 2020, Chemistry–Methods.

[27]  Jean-Louis Reymond,et al.  Mapping the space of chemical reactions using attention-based neural networks , 2020, Nature Machine Intelligence.

[28]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[29]  Anatole von Lilienfeld,et al.  Thousands of reactants and transition states for competing E2 and S N 2 reactions , 2020, Mach. Learn. Sci. Technol..

[30]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[31]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks , 2020, ICLR.

[32]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[33]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[34]  I. Davies The digitization of organic synthesis , 2019, Nature.

[35]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[36]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[37]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[38]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[39]  W. Warr A Short Review of Chemical Reaction Database Systems, Computer‐Aided Synthesis Design, Reaction Prediction and Synthetic Feasibility , 2014, Molecular informatics.

[40]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[41]  N. Donahue Reaction barriers: origin and evolution. , 2003, Chemical reviews.

[42]  G. R. Heal Evaluation of the Function p(X), Used in Non-Isothermal Kinetics, by a Series of Chebyshev Polynomials , 1999 .

[43]  Zlatko Mihalić,et al.  A graph-theoretical approach to structure-property relationships , 1992 .

[44]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[45]  Johannes T. Margraf,et al.  Machine learning in chemical reaction space , 2020, Nature Communications.

[46]  Svante Arrhenius,et al.  Discussion on “the radiation theory of chemical action” , 1922 .