Molecular Geometry Prediction using a Deep Generative Graph Neural Network

A molecule’s geometry, also known as conformation, is one of a molecule’s most important properties, determining the reactions it participates in, the bonds it forms, and the interactions it has with other molecules. Conventional conformation generation methods minimize hand-designed molecular force field energy functions that are often not well correlated with the true energy function of a molecule observed in nature. They generate geometrically diverse sets of conformations, some of which are very similar to the lowest-energy conformations and others of which are very different. In this paper, we propose a conditional deep generative graph neural network that learns an energy function by directly learning to generate molecular conformations that are energetically favorable and more likely to be observed experimentally in data-driven manner. On three large-scale datasets containing small molecules, we show that our method generates a set of conformations that on average is far more likely to be close to the corresponding reference conformations than are those obtained from conventional force field methods. Our method maintains geometrical diversity by generating conformations that are not too similar to each other, and is also computationally faster. We also show that our method can be used to provide initial coordinates for conventional force field methods. On one of the evaluated datasets we show that this combination allows us to combine the best of both methods, yielding generated conformations that are on average close to reference conformations with some very similar to reference conformations.

[1]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[2]  Frank H. Allen,et al.  Cambridge Structural Database , 2002 .

[3]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[4]  Markus Voelter,et al.  State of the Art , 1997, Pediatric Research.

[5]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[6]  Peter Moeck,et al.  Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration , 2011, Nucleic Acids Res..

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[10]  Charlotte M. Deane,et al.  Freely Available Conformer Generation Methods: How Good Are They? , 2012, J. Chem. Inf. Model..

[11]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[12]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[13]  Pierre Baldi,et al.  Small-Molecule 3D Structure Prediction Using Open Crystallography Data , 2013, J. Chem. Inf. Model..

[14]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[15]  W. Goddard,et al.  UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations , 1992 .

[16]  Sereina Riniker,et al.  Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation , 2015, J. Chem. Inf. Model..

[17]  M. Levine,et al.  How Good Are They , 2016 .

[18]  Thierry Deutsch,et al.  Challenges in large scale quantum mechanical calculations , 2016, 1609.00252.

[19]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[20]  Mark S. Gordon,et al.  General atomic and molecular electronic structure system , 1993, J. Comput. Chem..

[21]  Thomas A. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[22]  P. Hawkins Conformation Generation: The State of the Art , 2017, J. Chem. Inf. Model..

[23]  Christof H. Schwab,et al.  Conformations and 3D pharmacophore searching. , 2010, Drug discovery today. Technologies.

[24]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[25]  Pablo Tamayo,et al.  Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies , 2014, Scientific Data.

[26]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[27]  Matthias Rarey,et al.  Benchmarking Commercial Conformer Ensemble Generators , 2017, J. Chem. Inf. Model..

[28]  Anubhav Jain,et al.  From the computer to the laboratory: materials discovery and design using first-principles calculations , 2012, Journal of Materials Science.

[29]  John A. Keith,et al.  A sobering assessment of small-molecule force field methods for low energy conformer predictions , 2017, 1705.04308.

[30]  W. A. Goddard,et al.  a Full Periodic Table Force Field for Molecular Mechanics and Molecular Dynamics Simulations , 2022 .

[31]  J. S. Dixon,et al.  Distance Geometry in Molecular Modeling , 2007 .

[32]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.