Convolutional neural network scoring and minimization in the D3R 2017 community challenge

We assess the ability of our convolutional neural network (CNN)-based scoring functions to perform several common tasks in the domain of drug discovery. These include correctly identifying ligand poses near and far from the true binding mode when given a set of reference receptors and classifying ligands as active or inactive using structural information. We use the CNN to re-score or refine poses generated using a conventional scoring function, Autodock Vina, and compare the performance of each of these methods to using the conventional scoring function alone. Furthermore, we assess several ways of choosing appropriate reference receptors in the context of the D3R 2017 community benchmarking challenge. We find that our CNN scoring function outperforms Vina on most tasks without requiring manual inspection by a knowledgeable operator, but that the pose prediction target chosen for the challenge, Cathepsin S, was particularly challenging for de novo docking. However, the CNN provided best-in-class performance on several virtual screening tasks, underscoring the relevance of deep learning to the field of drug discovery.

[1]  Richard D. Smith,et al.  CSAR 2014: A Benchmark Exercise Using Unpublished Data from Pharma , 2016, J. Chem. Inf. Model..

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on a Diverse Test Set , 2009, J. Chem. Inf. Model..

[4]  Ruben Abagyan,et al.  Pocketome: an encyclopedia of small-molecule binding sites in 4D , 2011, Nucleic Acids Res..

[5]  Jennifer L. Knight,et al.  OPLS3: A Force Field Providing Broad Coverage of Drug-like Small Molecules and Proteins. , 2016, Journal of chemical theory and computation.

[6]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[7]  Holger Gohlke,et al.  The Amber biomolecular simulation programs , 2005, J. Comput. Chem..

[8]  K. Dill,et al.  Predicting absolute ligand binding free energies to a simple model site. , 2007, Journal of molecular biology.

[9]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Richard D. Smith,et al.  CSAR Benchmark Exercise 2013: Evaluation of Results from a Combined Computational Protein Design, Docking, and Scoring/Ranking Challenge , 2016, J. Chem. Inf. Model..

[11]  Xiaoqin Zou,et al.  Chapter 14 - Mean-Force Scoring Functions for Protein–Ligand Binding , 2010 .

[12]  Christoph A. Sotriffer,et al.  SFCscoreRF: A Random Forest-Based Scoring Function for Improved Affinity Prediction of Protein-Ligand Complexes , 2013, J. Chem. Inf. Model..

[13]  Rommie E Amaro,et al.  Computational chemistry and drug discovery: a call to action. , 2012, Future medicinal chemistry.

[14]  G. V. Paolini,et al.  Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes , 1997, J. Comput. Aided Mol. Des..

[15]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[16]  Jacob D. Durrant,et al.  NNScore: A Neural-Network-Based Scoring Function for the Characterization of Protein−Ligand Complexes , 2010, J. Chem. Inf. Model..

[17]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[18]  Huanwang Yang,et al.  D3R grand challenge 4: blind prediction of protein–ligand poses, affinity rankings, and relative binding free energies , 2020, Journal of Computer-Aided Molecular Design.

[19]  G. Klebe,et al.  Knowledge-based scoring function to predict protein-ligand interactions. , 2000, Journal of molecular biology.

[20]  Marcel L Verdonk,et al.  General and targeted statistical potentials for protein–ligand interactions , 2005, Proteins.

[21]  Y. Martin,et al.  A general and fast scoring function for protein-ligand interactions: a simplified potential approach. , 1999, Journal of medicinal chemistry.

[22]  David Ryan Koes,et al.  Visualizing Convolutional Neural Network Protein-Ligand Scoring , 2018, Journal of molecular graphics & modelling.

[23]  Jung-Hsin Lin,et al.  Scoring functions for prediction of protein-ligand interactions. , 2013, Current pharmaceutical design.

[24]  Michael J. Bodkin,et al.  Accurate calculation of the absolute free energy of binding for drug molecules , 2015, Chemical science.

[25]  Vijay S. Pande,et al.  Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity , 2017, ArXiv.

[26]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[27]  Hans-Joachim Böhm,et al.  The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure , 1994, J. Comput. Aided Mol. Des..

[28]  Berk Hess,et al.  GROMACS 3.0: a package for molecular simulation and trajectory analysis , 2001 .

[29]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[30]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[31]  Lucy J. Colwell,et al.  Statistical and machine learning approaches to predicting protein-ligand interactions. , 2018, Current opinion in structural biology.

[32]  Horacio Emilio Pérez Sánchez,et al.  Virtual Screening: A Challenge for Deep Learning , 2016, PACBB.

[33]  Chris Oostenbrink,et al.  Improved ligand-protein binding affinity predictions using multiple binding modes. , 2010, Biophysical journal.

[34]  Jacob D. Durrant,et al.  NNScore 2.0: A Neural-Network Receptor–Ligand Scoring Function , 2011, J. Chem. Inf. Model..

[35]  David Ryan Koes,et al.  Protein-Ligand Scoring with Convolutional Neural Networks , 2016, Journal of chemical information and modeling.

[36]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Edward W. Lowe,et al.  Computational Methods in Drug Discovery , 2014, Pharmacological Reviews.

[39]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[40]  Andrea Califano,et al.  Toward better benchmarking: challenge-based methods assessment in cancer genomics , 2014, Genome Biology.

[41]  Nihar R. Mahapatra,et al.  Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins , 2015, BMC Bioinformatics.

[42]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[43]  Nikolay V. Dokholyan,et al.  MedusaScore: An Accurate Force Field-Based Scoring Function for Virtual Drug Screening , 2008, J. Chem. Inf. Model..

[44]  David Ryan Koes,et al.  Ligand Pose Optimization with Atomic Grid-Based Convolutional Neural Networks , 2017, ArXiv.

[45]  Igor I. Baskin,et al.  Predicting Ligand Binding Modes from Neural Networks Trained on Protein-Ligand Interaction Fingerprints , 2013, J. Chem. Inf. Model..

[46]  Thomas Stützle,et al.  Empirical Scoring Functions for Advanced Protein-Ligand Docking with PLANTS , 2009, J. Chem. Inf. Model..

[47]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[48]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[49]  Vinicius M Alves,et al.  Virtual screening strategies in medicinal chemistry: the state of the art and current challenges. , 2014, Current topics in medicinal chemistry.

[50]  Klaus-Robert Müller,et al.  SchNet: A continuous-filter convolutional neural network for modeling quantum interactions , 2017, NIPS.

[51]  Shuai Liu,et al.  D3R Grand Challenge 2: blind prediction of protein–ligand poses, affinity rankings, and relative binding free energies , 2017, Journal of Computer-Aided Molecular Design.

[52]  J. Skolnick,et al.  GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. , 2011, Biophysical journal.

[53]  Luhua Lai,et al.  Further development and validation of empirical scoring functions for structure-based binding affinity prediction , 2002, J. Comput. Aided Mol. Des..

[54]  Todd J. A. Ewing,et al.  DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases , 2001, J. Comput. Aided Mol. Des..

[55]  Umar Farooq Ghumman,et al.  Machine-Learning-Assisted De Novo Design of Organic Molecules and Polymers: Opportunities and Challenges , 2020, Polymers.

[56]  Hege S. Beard,et al.  Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. , 2004, Journal of medicinal chemistry.

[57]  Heather A. Carlson,et al.  Lessons Learned over Four Benchmark Exercises from the Community Structure-Activity Resource , 2016, Journal of Chemical Information and Modeling.

[58]  Oleg V Stroganov,et al.  The role of human in the loop: lessons from D3R challenge 4 , 2020, Journal of Computer-Aided Molecular Design.

[59]  Andrey Alekseenko,et al.  Sampling and refinement protocols for template-based macrocycle docking: 2018 D3R Grand Challenge 4 , 2019, Journal of Computer-Aided Molecular Design.

[60]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[61]  Xiaoqin Zou,et al.  An iterative knowledge‐based scoring function to predict protein–ligand interactions: II. Validation of the scoring function , 2006, J. Comput. Chem..

[62]  Nihar R. Mahapatra,et al.  A Comparative Assessment of Ranking Accuracies of Conventional and Machine-Learning-Based Scoring Functions for Protein-Ligand Binding Affinity Prediction , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[63]  Sergei Grudinin,et al.  Docking rigid macrocycles using Convex-PL, AutoDock Vina, and RDKit in the D3R Grand Challenge 4 , 2019, Journal of Computer-Aided Molecular Design.

[64]  W. L. Jorgensen,et al.  Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids , 1996 .

[65]  Gianni De Fabritiis,et al.  KDEEP: Protein-Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks , 2018, J. Chem. Inf. Model..

[66]  Shuai Liu,et al.  D3R grand challenge 2015: Evaluation of protein–ligand pose and affinity predictions , 2016, Journal of Computer-Aided Molecular Design.

[67]  Zhihai Liu,et al.  Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions. , 2017, Accounts of chemical research.

[68]  A M Hassell,et al.  Structure of the Tie2 RTK domain: self-inhibition by the nucleotide binding loop, activation loop, and C-terminal tail. , 2000, Structure.

[69]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[70]  Jeffrey Skolnick,et al.  Assessment of programs for ligand binding affinity prediction , 2008, J. Comput. Chem..

[71]  David Ryan Koes,et al.  Lessons Learned in Empirical Scoring with smina from the CSAR 2011 Benchmarking Exercise , 2013, J. Chem. Inf. Model..