Molecule3D: A Benchmark for Predicting 3D Geometries from Molecular Graphs

Graph neural networks are emerging as promising methods for modeling molecular graphs, in which nodes and edges correspond to atoms and chemical bonds, respectively. Recent studies show that when 3D molecular geometries, such as bond lengths and angles, are available, molecular property prediction tasks can be made more accurate. However, computing of 3D molecular geometries requires quantum calculations that are computationally prohibitive. For example, accurate calculation of 3D geometries of a small molecule requires hours of computing time using density functional theory (DFT). Here, we propose to predict the ground-state 3D geometries from molecular graphs using machine learning methods. To make this feasible, we develop a benchmark, known as Molecule3D, that includes a dataset with precise ground-state geometries of approximately 4 million molecules derived from DFT. We also provide a set of software tools for data processing, splitting, training, and evaluation, etc. Specifically, we propose to assess the error and validity of predicted geometries using four metrics. We implement two baseline methods that either predict the pairwise distance between atoms or atom coordinates in 3D space. Experimental results show that, compared with generating 3D geometries with RDKit, our method can achieve comparable prediction accuracy but with much smaller computational costs. Our Molecule3D is available as a module of the MoleculeX software library (https://github.com/divelab/MoleculeX). ∗These authors contributed equally to this work. Preprint. Under review. ar X iv :2 11 0. 01 71 7v 1 [ cs .L G ] 3 0 Se p 20 21

[1]  Markus Meuwly,et al.  PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments, and Partial Charges. , 2019, Journal of chemical theory and computation.

[2]  Y. Bengio,et al.  Learning Neural Generative Dynamics for Molecular Conformation Generation , 2021, ICLR.

[3]  Charlotte M. Deane,et al.  Deep Generative Models for 3D Linker Design , 2020, J. Chem. Inf. Model..

[4]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[5]  Bernard Ghanem,et al.  DeeperGCN: All You Need to Train Deeper GCNs , 2020, ArXiv.

[6]  Jian Tang,et al.  Learning Gradient Fields for Molecular Conformation Generation , 2021, ICML.

[7]  David Budden,et al.  Large-scale graph representation learning with very deep GNNs and self-supervision , 2021, ArXiv.

[8]  Noel M. O'Boyle,et al.  cclib: A library for package‐independent computational chemistry algorithms , 2008, J. Comput. Chem..

[9]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[10]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[11]  Shuangjia Zheng,et al.  SyntaLinker: automatic fragment linking with deep conditional transformer neural networks , 2020, Chemical Science.

[12]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[13]  Jian Tang,et al.  An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming , 2021, ICML.

[14]  Chenglin Wu,et al.  First Place Solution of KDD Cup 2021 & OGB Large-Scale Challenge Graph Prediction Track , 2021, ArXiv.

[15]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[16]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[17]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[18]  Shuiwang Ji,et al.  Spherical Message Passing for 3D Graph Networks , 2021, ArXiv.

[19]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[20]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  J. S. Dixon,et al.  Distance Geometry in Molecular Modeling , 2007 .

[23]  Maho Nakata,et al.  PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry , 2017, J. Chem. Inf. Model..

[24]  Shuiwang Ji,et al.  Graph U-Nets , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  José Miguel Hernández-Lobato,et al.  A Generative Model for Molecular Distance Geometry , 2020, ICML.

[26]  Shuiwang Ji,et al.  Advanced Graph and Sequence Neural Networks for Molecular Property Prediction and Drug Discovery. , 2020, Bioinformatics.

[27]  Klaus-Robert Müller,et al.  SchNet: A continuous-filter convolutional neural network for modeling quantum interactions , 2017, NIPS.

[28]  Shuiwang Ji,et al.  Towards Deeper Graph Neural Networks , 2020, KDD.

[29]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[30]  Anders S Christensen,et al.  FCHL revisited: Faster and more accurate quantum machine learning. , 2020, The Journal of chemical physics.

[31]  Jure Leskovec,et al.  OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs , 2021, NeurIPS Datasets and Benchmarks.

[32]  Sereina Riniker,et al.  Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation , 2015, J. Chem. Inf. Model..

[33]  Timothy F. Havel Distance Geometry: Theory, Algorithms, and Chemical Applications , 2002 .

[34]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[35]  Regina Barzilay,et al.  Analyzing Learned Molecular Representations for Property Prediction , 2019, J. Chem. Inf. Model..

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Elman Mansimov,et al.  Molecular Geometry Prediction using a Deep Generative Graph Neural Network , 2019, Scientific Reports.

[38]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[39]  Aditya R. Thawani,et al.  The Photoswitch Dataset: A Molecular Machine Learning Benchmark for the Advancement of Synthetic Chemistry , 2020, ArXiv.

[40]  C. Willmott,et al.  Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance , 2005 .

[41]  Stephan Günnemann,et al.  Directional Message Passing for Molecular Graphs , 2020, ICLR.

[42]  Emma J. Chory,et al.  A Deep Learning Approach to Antibiotic Discovery , 2020, Cell.

[43]  Shuiwang Ji,et al.  Fast Quantum Property Prediction via Deeper 2D and 3D Graph Networks , 2021, ArXiv.

[44]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[45]  Maho Nakata,et al.  PubChemQC PM6: Data Sets of 221 Million Molecules with Optimized Molecular Geometries and Electronic Properties , 2020, J. Chem. Inf. Model..

[46]  Zhengyang Wang,et al.  Large-Scale Learnable Graph Convolutional Networks , 2018, KDD.