GNINA 1.0: molecular docking with deep learning

Molecular docking computationally predicts the conformation of a small molecule when binding to a receptor. Scoring functions are a vital piece of any molecular docking pipeline as they determine the fitness of sampled poses. Here we describe and evaluate the 1.0 release of the Gnina docking software, which utilizes an ensemble of convolutional neural networks (CNNs) as a scoring function. We also explore an array of parameter values for Gnina 1.0 to optimize docking performance and computational cost. Docking performance, as evaluated by the percentage of targets where the top pose is better than 2Å root mean square deviation (Top1), is compared to AutoDock Vina scoring when utilizing explicitly defined binding pockets or whole protein docking. Gnina , utilizing a CNN scoring function to rescore the output poses, outperforms AutoDock Vina scoring on redocking and cross-docking tasks when the binding pocket is defined (Top1 increases from 58% to 73% and from 27% to 37%, respectively) and when the whole protein defines the binding pocket (Top1 increases from 31% to 38% and from 12% to 16%, respectively). The derived ensemble of CNNs generalizes to unseen proteins and ligands and produces scores that correlate well with the root mean square deviation to the known binding pose. We provide the 1.0 version of Gnina under an open source license for use as a molecular docking tool at https://github.com/gnina/gnina .

[1]  Jacob D. Durrant,et al.  NNScore: A Neural-Network-Based Scoring Function for the Characterization of Protein−Ligand Complexes , 2010, J. Chem. Inf. Model..

[2]  Mehrdad Mahdavi,et al.  Guiding Conventional Protein-Ligand Docking Software with Convolutional Neural Networks , 2020, J. Chem. Inf. Model..

[3]  David Ryan Koes,et al.  Visualizing Convolutional Neural Network Protein-Ligand Scoring , 2018, Journal of molecular graphics & modelling.

[4]  David Ryan Koes,et al.  3D Convolutional Neural Networks and a CrossDocked Dataset for Structure-Based Drug Design. , 2020, Journal of chemical information and modeling.

[5]  Marta M. Stepniewska-Dziubinska,et al.  Development and evaluation of a deep learning model for protein-ligand binding affinity prediction , 2017, 1712.07042.

[6]  S. Teague Implications of protein flexibility for drug discovery , 2003, Nature Reviews Drug Discovery.

[7]  Gianni De Fabritiis,et al.  KDEEP: Protein-Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks , 2018, J. Chem. Inf. Model..

[8]  Christoph A. Sotriffer,et al.  SFCscoreRF: A Random Forest-Based Scoring Function for Improved Affinity Prediction of Protein-Ligand Complexes , 2013, J. Chem. Inf. Model..

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998, J. Comput. Chem..

[11]  Luhua Lai,et al.  Further development and validation of empirical scoring functions for structure-based binding affinity prediction , 2002, J. Comput. Aided Mol. Des..

[12]  Bentley M Wingert,et al.  Cross‐docking benchmark for automated pose and ranking prediction of ligand binding , 2019, Protein science : a publication of the Protein Society.

[13]  D. Koes,et al.  Generating 3D Molecular Structures Conditional on a Receptor Binding Site with Deep Generative Models , 2020, ArXiv.

[14]  Brian K Shoichet,et al.  Prediction of protein-ligand interactions. Docking and scoring: successes and gaps. , 2006, Journal of medicinal chemistry.

[15]  Pedro J Ballester,et al.  Machine‐learning scoring functions to improve structure‐based binding affinity prediction and virtual screening , 2015, Wiley interdisciplinary reviews. Computational molecular science.

[16]  Garrett M. Morris,et al.  Learning protein-ligand binding affinity with atomic environment vectors , 2021, Journal of Cheminformatics.

[17]  David Ryan Koes,et al.  Lessons Learned in Empirical Scoring with smina from the CSAR 2011 Benchmarking Exercise , 2013, J. Chem. Inf. Model..

[18]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[19]  Bo Wang,et al.  Support Vector Regression Scoring of Receptor-Ligand Complexes for Rank-Ordering and Virtual Screening of Chemical Libraries , 2011, J. Chem. Inf. Model..

[20]  Cícero Nogueira dos Santos,et al.  Boosting Docking-Based Virtual Screening with Deep Learning , 2016, J. Chem. Inf. Model..

[21]  Guo-Wei Wei,et al.  Integration of element specific persistent homology and machine learning for protein‐ligand binding affinity prediction , 2018, International journal for numerical methods in biomedical engineering.

[22]  J. Bajorath,et al.  Docking and scoring in virtual screening for drug discovery: methods and applications , 2004, Nature Reviews Drug Discovery.

[23]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[24]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[25]  Ivet Bahar,et al.  ProDy: Protein Dynamics Inferred from Theory and Experiments , 2011, Bioinform..

[26]  Trung Hai Nguyen,et al.  Autodock Vina Adopts More Accurate Binding Poses but Autodock4 Forms Better Binding Affinity , 2019, J. Chem. Inf. Model..

[27]  Kaifu Gao,et al.  MathDL: mathematical deep learning for D3R Grand Challenge 4 , 2019, Journal of Computer-Aided Molecular Design.

[28]  Yan Li,et al.  Comparative Assessment of Scoring Functions: The CASF-2016 Update , 2018, J. Chem. Inf. Model..

[29]  I. Muegge A knowledge-based scoring function for protein-ligand interactions: Probing the reference state , 2000 .

[30]  Bo Wang,et al.  Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities , 2018, Inf. Fusion.

[31]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[32]  Michael K. Gilson,et al.  BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology , 2015, Nucleic Acids Res..

[33]  Marta M. Stepniewska-Dziubinska,et al.  Development and evaluation of a deep learning model for protein–ligand binding affinity prediction , 2017, Bioinform..

[34]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[35]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[36]  Michel F. Sanner,et al.  Protein–ligand docking with multiple flexible side chains , 2008, J. Comput. Aided Mol. Des..

[37]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[38]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[39]  Nihar R. Mahapatra,et al.  BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes , 2015, BMC Bioinformatics.

[40]  Y. Martin,et al.  A general and fast scoring function for protein-ligand interactions: a simplified potential approach. , 1999, Journal of medicinal chemistry.

[41]  Lin-Li Li,et al.  ID-Score: A New Empirical Scoring Function Based on a Comprehensive Set of Descriptors Related to Protein-Ligand Interactions , 2013, J. Chem. Inf. Model..

[42]  David Ryan Koes,et al.  Protein-Ligand Scoring with Convolutional Neural Networks , 2016, Journal of chemical information and modeling.

[43]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998 .

[44]  Jie Liu,et al.  Classification of Current Scoring Functions , 2015, J. Chem. Inf. Model..

[45]  Zhihai Liu,et al.  Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions. , 2017, Accounts of chemical research.

[46]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[47]  M. Jacobson,et al.  Molecular mechanics methods for predicting protein-ligand binding. , 2006, Physical chemistry chemical physics : PCCP.

[48]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[49]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[50]  Pedro J. Ballester,et al.  Machine Learning Scoring Functions Based on Random Forest and Support Vector Regression , 2012, PRIB.

[51]  Yang Li,et al.  PotentialNet for Molecular Property Prediction , 2018, ACS central science.

[52]  Natasja Brooijmans,et al.  Molecular recognition and docking algorithms. , 2003, Annual review of biophysics and biomolecular structure.

[53]  G. V. Paolini,et al.  Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes , 1997, J. Comput. Aided Mol. Des..

[54]  Yurii S. Moroz,et al.  Ultra-large library docking for discovering new chemotypes , 2019, Nature.

[55]  Hossam M Ashtawy,et al.  Task-Specific Scoring Functions for Predicting Ligand Binding Poses and Affinity and for Screening Enrichment , 2017, J. Chem. Inf. Model..

[56]  Rodrigo Quiroga,et al.  Vinardo: A Scoring Function Based on Autodock Vina Improves Scoring, Docking, and Virtual Screening , 2016, PloS one.