Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets

There is a growing body of evidence showing that machine learning regression results in more accurate structure‐based prediction of protein‐ligand binding affinity. Docking methods that aim at optimizing the affinity of ligands for a target rely on how accurate their predicted ranking is. However, despite their proven advantages, machine‐learning scoring functions are still not widely applied. This seems to be due to insufficient understanding of their properties and the lack of user‐friendly software implementing them. Here we present a study where the accuracy of AutoDock Vina, arguably the most commonly‐used docking software, is strongly improved by following a machine learning approach. We also analyse the factors that are responsible for this improvement and their generality. Most importantly, with the help of a proposed benchmark, we demonstrate that this improvement will be larger as more data becomes available for training Random Forest models, as regression models implying additive functional forms do not improve with more training data. We discuss how the latter opens the door to new opportunities in scoring function development. In order to facilitate the translation of this advance to enhance structure‐based molecular design, we provide software to directly re‐score Vina‐generated poses and thus strongly improve their predicted binding affinity. The software is available at http://istar.cse.cuhk.edu.hk/rf‐score‐3.tgz and http://crcm. marseille.inserm.fr/fileadmin/rf‐score‐3.tgz

[1]  Bo Yang,et al.  Integrating docking scores, interaction profiles and molecular descriptors to improve the accuracy of molecular docking: toward the discovery of novel Akt1 inhibitors. , 2014, European journal of medicinal chemistry.

[2]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998 .

[3]  Dik-Lung Ma,et al.  Drug repositioning by structure-based virtual screening. , 2013, Chemical Society reviews.

[4]  Sarah L. Kinnings,et al.  Novel computational approaches to polypharmacology as a means to define responses to individual drugs. , 2012, Annual review of pharmacology and toxicology.

[5]  Peter Gedeck,et al.  Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets , 2010, J. Chem. Inf. Model..

[6]  Garland R. Marshall,et al.  PHOENIX: A Scoring Function for Affinity Prediction Derived Using High-Resolution Crystal Structures and Calorimetry Measurements , 2011, J. Chem. Inf. Model..

[7]  Gisbert Schneider,et al.  Virtual screening: an endless staircase? , 2010, Nature Reviews Drug Discovery.

[8]  J Andrew McCammon,et al.  BINANA: a novel algorithm for ligand-binding characterization. , 2011, Journal of molecular graphics & modelling.

[9]  Wagner Meira,et al.  aCSM: noise-free graph-based signatures to large-scale receptor-based ligand prediction , 2013, Bioinform..

[10]  Lin-Li Li,et al.  ID-Score: A New Empirical Scoring Function Based on a Comprehensive Set of Descriptors Related to Protein-Ligand Interactions , 2013, J. Chem. Inf. Model..

[11]  G. Klebe,et al.  Knowledge-based scoring function to predict protein-ligand interactions. , 2000, Journal of molecular biology.

[12]  Emidio Capriotti,et al.  Bioinformatics and variability in drug response: a protein structural perspective , 2012, Journal of The Royal Society Interface.

[13]  Robert P. Sheridan,et al.  Comparison of Topological, Shape, and Docking Methods in Virtual Screening , 2007, J. Chem. Inf. Model..

[14]  C. Springer,et al.  PostDOCK: a structural, empirical approach to scoring protein ligand complexes. , 2005, Journal of medicinal chemistry.

[15]  Didier Rognan,et al.  Structure‐Based Approaches to Target Fishing and Ligand Profiling , 2010, Molecular informatics.

[16]  John B. O. Mitchell,et al.  Hierarchical virtual screening for the discovery of new molecular scaffolds in antibacterial hit identification , 2012, Journal of The Royal Society Interface.

[17]  Johannes C. Hermann,et al.  Structure-based activity prediction for an enzyme of unknown function , 2007, Nature.

[18]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[19]  Fedor N. Novikov,et al.  CSAR Scoring Challenge Reveals the Need for New Concepts in Estimating Protein-Ligand Binding Affinity , 2011, J. Chem. Inf. Model..

[20]  Björn Krüger,et al.  The holistic integration of virtual screening in drug discovery. , 2013, Drug discovery today.

[21]  Hans-Joachim Böhm,et al.  The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure , 1994, J. Comput. Aided Mol. Des..

[22]  I. Kuntz,et al.  Automated docking with grid‐based energy evaluation , 1992 .

[23]  Jacob D. Durrant,et al.  Comparing Neural-Network Scoring Functions and the State of the Art: Applications to Common Library Screening , 2013, J. Chem. Inf. Model..

[24]  Tom L. Blundell,et al.  Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity? , 2014, J. Chem. Inf. Model..

[25]  Philip E. Bourne,et al.  A Machine Learning-Based Method To Improve Docking Scoring Functions and Its Application to Drug Repurposing , 2011, J. Chem. Inf. Model..

[26]  C. Venkatachalam,et al.  LigScore: a novel scoring function for predicting binding affinities. , 2005, Journal of molecular graphics & modelling.

[27]  Arthur J. Olson,et al.  Robust Scoring Functions for Protein-Ligand Interactions with Quantum Chemical Charge Models , 2011, J. Chem. Inf. Model..

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  William L Jorgensen,et al.  Efficient drug lead discovery and optimization. , 2009, Accounts of chemical research.

[30]  Bo Wang,et al.  Support Vector Regression Scoring of Receptor-Ligand Complexes for Rank-Ordering and Virtual Screening of Chemical Libraries , 2011, J. Chem. Inf. Model..

[31]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[32]  Xiaoqin Zou,et al.  Scoring functions and their evaluation methods for protein-ligand docking: recent advances and future directions. , 2010, Physical chemistry chemical physics : PCCP.

[33]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[34]  Jacob D. Durrant,et al.  NNScore: A Neural-Network-Based Scoring Function for the Characterization of Protein−Ligand Complexes , 2010, J. Chem. Inf. Model..

[35]  Jung-Hsin Lin,et al.  Scoring functions for prediction of protein-ligand interactions. , 2013, Current pharmaceutical design.

[36]  Dariusz Plewczynski,et al.  Can we trust docking results? Evaluation of seven commonly used programs on PDBbind database , 2011, J. Comput. Chem..

[37]  John B. O. Mitchell,et al.  Comments on "Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets": Significance for the Validation of Scoring Functions , 2011, J. Chem. Inf. Model..

[38]  Marcel L Verdonk,et al.  General and targeted statistical potentials for protein–ligand interactions , 2005, Proteins.

[39]  Shuichi Hirono,et al.  Comparison of Consensus Scoring Strategies for Evaluating Computational Models of Protein-Ligand Complexes , 2006, J. Chem. Inf. Model..

[40]  Luhua Lai,et al.  Further development and validation of empirical scoring functions for structure-based binding affinity prediction , 2002, J. Comput. Aided Mol. Des..

[41]  G. V. Paolini,et al.  Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes , 1997, J. Comput. Aided Mol. Des..

[42]  Y. Martin,et al.  A general and fast scoring function for protein-ligand interactions: a simplified potential approach. , 1999, Journal of medicinal chemistry.

[43]  Yanli Wang,et al.  Structure-Based Virtual Screening for Drug Discovery: a Problem-Centric Review , 2012, The AAPS Journal.

[44]  G. Klebe,et al.  DrugScore(CSD)-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. , 2005, Journal of medicinal chemistry.

[45]  K A Dill,et al.  Additivity Principles in Biochemistry* , 1997, The Journal of Biological Chemistry.

[46]  Jonathan W. Essex,et al.  Prediction of protein–ligand binding affinity by free energy simulations: assumptions, pitfalls and expectations , 2010, J. Comput. Aided Mol. Des..

[47]  Brian K Shoichet,et al.  Prediction of protein-ligand interactions. Docking and scoring: successes and gaps. , 2006, Journal of medicinal chemistry.

[48]  Gennady M Verkhivker,et al.  Molecular recognition of the inhibitor AG-1343 by HIV-1 protease: conformationally flexible docking by evolutionary programming. , 1995, Chemistry & biology.

[49]  Matthias Rarey,et al.  A consistent description of HYdrogen bond and DEhydration energies in protein–ligand complexes: methods behind the HYDE scoring function , 2012, Journal of Computer-Aided Molecular Design.

[50]  John Yu,et al.  HotLig: A Molecular Surface-Directed Approach to Scoring Protein-Ligand Interactions , 2013, J. Chem. Inf. Model..

[51]  Ajay N. Jain Scoring noncovalent protein-ligand interactions: A continuous differentiable function tuned to compute binding affinities , 1996, J. Comput. Aided Mol. Des..

[52]  Gerhard Klebe,et al.  DSX: A Knowledge-Based Scoring Function for the Assessment of Protein-Ligand Complexes , 2011, J. Chem. Inf. Model..

[53]  Jacob D. Durrant,et al.  NNScore 2.0: A Neural-Network Receptor–Ligand Scoring Function , 2011, J. Chem. Inf. Model..

[54]  M. Jacobson,et al.  Molecular mechanics methods for predicting protein-ligand binding. , 2006, Physical chemistry chemical physics : PCCP.

[55]  Sourav Das,et al.  Binding Affinity Prediction with Property-Encoded Shape Distribution Signatures , 2010, J. Chem. Inf. Model..

[56]  Kwong-Sak Leung,et al.  istar: A Web Platform for Large-Scale Protein-Ligand Docking , 2014, PloS one.

[57]  Gerhard Klebe,et al.  Non-additivity of functional group contributions in protein-ligand binding: a comprehensive study by crystallography and isothermal titration calorimetry. , 2010, Journal of molecular biology.

[58]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[59]  J. Trosset,et al.  Structure-based target druggability assessment. , 2013, Methods in molecular biology.

[60]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on a Diverse Test Set , 2009, J. Chem. Inf. Model..

[61]  Elizabeth Yuriev,et al.  Latest developments in molecular docking: 2010–2011 in review , 2013, Journal of molecular recognition : JMR.

[62]  R. Glen,et al.  Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation. , 1995, Journal of molecular biology.

[63]  Lei Li,et al.  Improved protein-ligand binding affinity prediction by using a curvature-dependent surface-area model , 2014, Bioinform..

[64]  Jian Wang,et al.  Characterization of Small Molecule Binding. I. Accurate Identification of Strong Inhibitors in Virtual Screening , 2013, J. Chem. Inf. Model..

[65]  Daniel Kuhn,et al.  DoGSiteScorer: a web server for automatic binding site prediction, analysis and druggability assessment , 2012, Bioinform..

[66]  Brian K. Shoichet,et al.  Statistical Potential for Modeling and Ranking of Protein-Ligand Interactions , 2011, J. Chem. Inf. Model..

[67]  Stéphanie Pérot,et al.  Druggable pockets and binding site centric chemical space: a paradigm shift in drug discovery. , 2010, Drug discovery today.