libmolgrid: GPU Accelerated Molecular Gridding for Deep Learning Applications

There are many ways to represent a molecule as input to a machine learning model and each is associated with loss and retention of certain kinds of information. In the interest of preserving three-dimensional spatial information, including bond angles and torsions, we have developed libmolgrid, a general-purpose library for representing three-dimensional molecules using multidimensional arrays. This library also provides functionality for composing batches of data suited to machine learning workflows, including data augmentation, class balancing, and example stratification according to a regression variable or data subgroup, and it further supports temporal and spatial recurrences over that data to facilitate work with recurrent neural networks, dynamical data, and size extensive modeling. It was designed for seamless integration with popular deep learning frameworks, including Caffe, PyTorch, and Keras, providing good performance by leveraging graphical processing units (GPUs) for computationally-intensive tasks and efficient memory usage through the use of memory views over preallocated buffers. libmolgrid is a free and open source project that is actively supported, serving the growing need in the molecular modeling community for tools that streamline the process of data ingestion, representation construction, and principled machine learning model development.

[1]  Tom L. Blundell,et al.  Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity? , 2014, J. Chem. Inf. Model..

[2]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[3]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[4]  Gianni De Fabritiis,et al.  DeltaDelta neural networks for lead optimization of small molecule potency† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc04606b , 2019, Chemical science.

[5]  Ying Xue,et al.  Prediction of P‐Glycoprotein Substrates by a Support Vector Machine Approach. , 2004 .

[6]  Jürgen Bajorath,et al.  Entering the ‘big data’ era in medicinal chemistry: molecular promiscuity analysis revisited , 2017, Future science OA.

[7]  Max Welling,et al.  Gauge Equivariant Convolutional Networks and the Icosahedral CNN 1 , 2019 .

[8]  Goutam Paul,et al.  A machine learning approach towards the prediction of protein–ligand binding affinity based on fundamental molecular properties , 2018, RSC advances.

[9]  Gianni De Fabritiis,et al.  Shape-Based Generative Modeling for de Novo Drug Design , 2019, J. Chem. Inf. Model..

[10]  Vijay S. Pande,et al.  Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity , 2017, ArXiv.

[11]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[12]  Maurice Weiler,et al.  Learning Steerable Filters for Rotation Equivariant CNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Gregor Urban,et al.  Inner and Outer Recursive Neural Networks for Chemoinformatics Applications , 2018, J. Chem. Inf. Model..

[14]  Nihar R. Mahapatra,et al.  Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins , 2015, BMC Bioinformatics.

[15]  Lavery,et al.  Mathematical Challenges from Theoretical/Computational Chemistry. , 1995 .

[16]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[17]  Nikos Paragios,et al.  EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation , 2017, PeerJ.

[18]  Li Li,et al.  Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds , 2018, ArXiv.

[19]  Ekaterina Gordeeva,et al.  Traditional topological indexes vs electronic, geometrical, and combined molecular descriptors in QSAR/QSPR research , 1993, J. Chem. Inf. Comput. Sci..

[20]  Zhiqiang Wei,et al.  A novel protein descriptor for the prediction of drug binding sites , 2019, BMC Bioinformatics.

[21]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[22]  Jacob D. Durrant,et al.  NNScore 2.0: A Neural-Network Receptor–Ligand Scoring Function , 2011, J. Chem. Inf. Model..

[23]  Gianni De Fabritiis,et al.  DeepSite: protein‐binding site predictor using 3D‐convolutional neural networks , 2017, Bioinform..

[24]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[25]  Richard Brown,et al.  An equivariant Bayesian convolutional network predicts recombination hotspots and accurately resolves binding motifs , 2018, bioRxiv.

[26]  A. Leach Molecular Modelling: Principles and Applications , 1996 .

[27]  Rim Shayakhmetov,et al.  3D Molecular Representations Based on the Wave Transform for Convolutional Neural Networks. , 2018, Molecular pharmaceutics.

[28]  Alan R. Katritzky,et al.  Quantum-Chemical Descriptors in QSAR/QSPR Studies , 1996 .

[29]  Guo-Wei Wei,et al.  TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions , 2017, PLoS Comput. Biol..

[30]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[31]  Klaus-Robert Müller,et al.  Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. , 2013, Journal of chemical theory and computation.

[32]  Svetha Venkatesh,et al.  Graph Memory Networks for Molecular Activity Prediction , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[33]  J. Behler Atom-centered symmetry functions for constructing high-dimensional neural network potentials. , 2011, The Journal of chemical physics.

[34]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[35]  Sepp Hochreiter,et al.  Toxicity Prediction using Deep Learning , 2015, ArXiv.

[36]  M Pastor,et al.  VolSurf: a new tool for the pharmacokinetic optimization of lead compounds. , 2000, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[37]  Seiji Kajita,et al.  A Universal 3D Voxel Descriptor for Solid-State Material Informatics with Deep Convolutional Neural Networks , 2017, Scientific Reports.

[38]  M Hendlich,et al.  LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[39]  Igor I. Baskin,et al.  Predicting Ligand Binding Modes from Neural Networks Trained on Protein-Ligand Interaction Fingerprints , 2013, J. Chem. Inf. Model..

[40]  Djork-Arné Clevert,et al.  Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations , 2018, Chemical science.

[41]  Theodore Lim,et al.  Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[42]  Gianni De Fabritiis,et al.  PlayMolecule BindScope: large scale CNN-based virtual screening on the web , 2018, Bioinform..

[43]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[44]  Abhinav Vishnu,et al.  Deep learning for computational chemistry , 2017, J. Comput. Chem..

[45]  Ruben Casado,et al.  Big Data issues in Computational Chemistry , 2014, 2014 International Conference on Future Internet of Things and Cloud.

[46]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[47]  P E Bourne,et al.  The Protein Data Bank. , 2002, Nucleic acids research.

[48]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[49]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[50]  Christoph A. Sotriffer,et al.  SFCscoreRF: A Random Forest-Based Scoring Function for Improved Affinity Prediction of Protein-Ligand Complexes , 2013, J. Chem. Inf. Model..

[51]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[52]  David Ryan Koes,et al.  Protein-Ligand Scoring with Convolutional Neural Networks , 2016, Journal of chemical information and modeling.

[53]  Yang Li,et al.  PotentialNet for Molecular Property Prediction , 2018, ACS central science.

[54]  Lemont B. Kier,et al.  Molecular structure description , 1999 .

[55]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[56]  Pierre Baldi,et al.  Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules , 2013, J. Chem. Inf. Model..

[57]  Sean Ekins,et al.  Combining Computational Methods for Hit to Lead Optimization in Mycobacterium Tuberculosis Drug Discovery , 2013, Pharmaceutical Research.

[58]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[59]  Li Li,et al.  Optimization of Molecules via Deep Reinforcement Learning , 2018, Scientific Reports.

[60]  Wouter Boomsma,et al.  Spherical convolutions and their application in molecular modelling , 2017, NIPS.

[61]  Wei Deng,et al.  Predicting Protein‐Ligand Binding Affinities Using Novel Geometrical Descriptors and Machine‐Learning Methods. , 2004 .

[62]  J Andrew McCammon,et al.  BINANA: a novel algorithm for ligand-binding characterization. , 2011, Journal of molecular graphics & modelling.

[63]  Masakazu Sekijima,et al.  Predicting Strategies for Lead Optimization via Learning to Rank , 2018 .

[64]  Gianni De Fabritiis,et al.  LigVoxel: inpainting binding pockets using 3D‐convolutional neural networks , 2018, Bioinform..

[65]  Klaus-Robert Müller,et al.  SchNet: A continuous-filter convolutional neural network for modeling quantum interactions , 2017, NIPS.