Calculating an optimal box size for ligand docking and virtual screening against experimental and predicted binding pockets

AbstractBackgroundComputational approaches have emerged as an instrumental methodology in modern research. For example, virtual screening by molecular docking is routinely used in computer-aided drug discovery. One of the critical parameters for ligand docking is the size of a search space used to identify low-energy binding poses of drug candidates. Currently available docking packages often come with a default protocol for calculating the box size, however, many of these procedures have not been systematically evaluated.MethodsIn this study, we investigate how the docking accuracy of AutoDock Vina is affected by the selection of a search space. We propose a new procedure for calculating the optimal docking box size that maximizes the accuracy of binding pose prediction against a non-redundant and representative dataset of 3,659 protein-ligand complexes selected from the Protein Data Bank. Subsequently, we use the Directory of Useful Decoys, Enhanced to demonstrate that the optimized docking box size also yields an improved ranking in virtual screening. Binding pockets in both datasets are derived from the experimental complex structures and, additionally, predicted by eFindSite.ResultsA systematic analysis of ligand binding poses generated by AutoDock Vina shows that the highest accuracy is achieved when the dimensions of the search space are 2.9 times larger than the radius of gyration of a docking compound. Subsequent virtual screening benchmarks demonstrate that this optimized docking box size also improves compound ranking. For instance, using predicted ligand binding sites, the average enrichment factor calculated for the top 1 % (10 %) of the screening library is 8.20 (3.28) for the optimized protocol, compared to 7.67 (3.19) for the default procedure. Depending on the evaluation metric, the optimal docking box size gives better ranking in virtual screening for about two-thirds of target proteins.ConclusionsThis fully automated procedure can be used to optimize docking protocols in order to improve the ranking accuracy in production virtual screening simulations. Importantly, the optimized search space systematically yields better results than the default method not only for experimental pockets, but also for those predicted from protein structures. A script for calculating the optimal docking box size is freely available at www.brylinski.org/content/docking-box-size. Graphical AbstractWe developed a procedure to optimize the box size in molecular docking calculations. Left panel shows the predicted binding pose of NADP (green sticks) compared to the experimental complex structure of human aldose reductase (blue sticks) using a default protocol. Right panel shows the docking accuracy using an optimized box size.

[1]  Richard D. Taylor,et al.  Improved protein–ligand docking using GOLD , 2003, Proteins.

[2]  Michal Brylinski,et al.  eFindSite: Improved prediction of ligand binding sites in protein models using meta-threading, machine learning and auxiliary ligands , 2013, Journal of Computer-Aided Molecular Design.

[3]  Didier Rognan,et al.  Beware of Machine Learning-Based Scoring Functions - On the Danger of Developing Black Boxes , 2014, J. Chem. Inf. Model..

[4]  John Moult,et al.  A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. , 2005, Current opinion in structural biology.

[5]  Ajay N. Jain,et al.  Surflex-Dock: Docking benchmarks and real-world application , 2012, Journal of Computer-Aided Molecular Design.

[6]  Jeffrey Skolnick,et al.  Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score , 2008, BMC Bioinformatics.

[7]  Max W. Chang,et al.  Virtual Screening for HIV Protease Inhibitors: A Comparison of AutoDock 4 and Vina , 2010, PloS one.

[8]  Torsten Schwede,et al.  The SWISS-MODEL Repository of annotated three-dimensional protein structure homology models , 2004, Nucleic Acids Res..

[9]  Dariusz Plewczynski,et al.  Can we trust docking results? Evaluation of seven commonly used programs on PDBbind database , 2011, J. Comput. Chem..

[10]  Xavier Barril,et al.  rDock: A Fast, Versatile and Open Source Program for Docking Ligands to Proteins and Nucleic Acids , 2014, PLoS Comput. Biol..

[11]  Yang Zhang,et al.  I‐TASSER: Fully automated protein structure prediction in CASP8 , 2009, Proteins.

[12]  Elizabeth Yuriev,et al.  Latest developments in molecular docking: 2010–2011 in review , 2013, Journal of molecular recognition : JMR.

[13]  Izhar Wallach,et al.  The protein-small-molecule database, a non-redundant structural resource for the analysis of protein-ligand binding , 2009, Bioinform..

[14]  G. Klebe,et al.  DrugScore(CSD)-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. , 2005, Journal of medicinal chemistry.

[15]  Vince Grolmusz,et al.  Evaluating Genetic Algorithms in Protein-Ligand Docking , 2008, ISBRA.

[16]  O. V. Galzitskaya,et al.  Radius of gyration as an indicator of protein structure compactness , 2008, Molecular Biology.

[17]  N. S. Bogatyreva,et al.  [Radius of gyration is indicator of compactness of protein structure]. , 2008, Molekuliarnaia biologiia.

[18]  Nicolas Moitessier,et al.  Docking Ligands into Flexible and Solvated Macromolecules. 6. Development and Application to the Docking of HDACs and other Zinc Metalloenzymes Inhibitors , 2014, J. Chem. Inf. Model..

[19]  Dik-Lung Ma,et al.  Drug repositioning by structure-based virtual screening. , 2013, Chemical Society reviews.

[20]  R. Read,et al.  A multiple-start Monte Carlo docking method. , 1992, Proteins.

[21]  J. Bajorath,et al.  Docking and scoring in virtual screening for drug discovery: methods and applications , 2004, Nature Reviews Drug Discovery.

[22]  Feng Ding,et al.  Rapid Flexible Docking Using a Stochastic Rotamer Library of Ligands , 2010, J. Chem. Inf. Model..

[23]  G. Bifulco,et al.  Inverse Virtual Screening allows the discovery of the biological activity of natural compounds. , 2012, Bioorganic & medicinal chemistry.

[24]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[25]  Stefan Paula,et al.  Comparison of current docking tools for the simulation of inhibitor binding by the transmembrane domain of the sarco/endoplasmic reticulum calcium ATPase. , 2010, Biophysical chemistry.

[26]  Brian K. Shoichet,et al.  Virtual screening of chemical libraries , 2004, Nature.

[27]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[28]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[29]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[30]  Philip E. Bourne,et al.  Drug Discovery Using Chemical Systems Biology: Repositioning the Safe Medicine Comtan to Treat Multi-Drug and Extensively Drug Resistant Tuberculosis , 2009, PLoS Comput. Biol..

[31]  Thomas Lengauer,et al.  Evaluation of the FLEXX incremental construction algorithm for protein–ligand docking , 1999, Proteins.

[32]  Jill Trewhella,et al.  Small‐angle scattering for structural biology—Expanding the frontier while avoiding the pitfalls , 2010, Protein science : a publication of the Protein Society.

[33]  M. Brylinski,et al.  eThread: A Highly Optimized Machine Learning-Based Approach to Meta-Threading and the Modeling of Protein Tertiary Structures , 2012, PloS one.

[34]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[35]  Y.Z. Chen,et al.  Ligand–protein inverse docking and its potential use in the computer search of protein targets of a small molecule , 2001, Proteins.

[36]  Gisbert Schneider,et al.  Virtual screening and fast automated docking methods. , 2002, Drug discovery today.

[37]  Todd J. A. Ewing,et al.  DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases , 2001, J. Comput. Aided Mol. Des..

[38]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[39]  Mona Singh,et al.  Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure , 2009, PLoS Comput. Biol..

[40]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[41]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[42]  J. Irwin,et al.  Benchmarking sets for molecular docking. , 2006, Journal of medicinal chemistry.

[43]  G. Klebe,et al.  Identification and mapping of small-molecule binding sites in proteins: computational tools for structure-based drug design. , 2002, Farmaco.

[44]  Gerhard Klebe,et al.  DSX: A Knowledge-Based Scoring Function for the Assessment of Protein-Ligand Complexes , 2011, J. Chem. Inf. Model..

[45]  Chris Oostenbrink,et al.  Improved ligand-protein binding affinity predictions using multiple binding modes. , 2010, Biophysical journal.

[46]  Thomas Lengauer,et al.  A fast flexible docking method using an incremental construction algorithm. , 1996, Journal of molecular biology.

[47]  Michal Brylinski,et al.  Unleashing the power of meta-threading for evolution/structure-based function inference of proteins , 2013, Front. Genet..

[48]  Alasdair T. R. Laurie,et al.  Methods for the prediction of protein-ligand binding sites for structure-based drug design and virtual ligand screening. , 2006, Current protein & peptide science.

[49]  Jaime Prilusky,et al.  Automated analysis of interatomic contacts in proteins , 1999, Bioinform..

[50]  Mark McGann,et al.  FRED Pose Prediction and Virtual Screening Accuracy , 2011, J. Chem. Inf. Model..

[51]  Charles L. Brooks,et al.  Detailed analysis of grid‐based molecular docking: A case study of CDOCKER—A CHARMm‐based MD docking algorithm , 2003, J. Comput. Chem..

[52]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[53]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[54]  F A Quiocho,et al.  An unlikely sugar substrate site in the 1.65 A structure of the human aldose reductase holoenzyme implicated in diabetic complications. , 1992, Science.

[55]  Olivier Sperandio,et al.  Free resources to assist structure-based virtual ligand screening experiments. , 2007, Current protein & peptide science.

[56]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[57]  Jianghong An,et al.  A large-scale computational approach to drug repositioning. , 2006, Genome informatics. International Conference on Genome Informatics.

[58]  Yu Li,et al.  Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction , 2011, Bioinform..

[59]  Natasja Brooijmans,et al.  Molecular recognition and docking algorithms. , 2003, Annual review of biophysics and biomolecular structure.

[60]  Garland R. Marshall,et al.  SKATE: A docking program that decouples systematic sampling from scoring , 2010, J. Comput. Chem..

[61]  Nathanael Weill,et al.  Docking Ligands into Flexible and Solvated Macromolecules, 7. Impact of Protein Flexibility and Water Molecules on Docking-Based Virtual Screening Accuracy , 2014, J. Chem. Inf. Model..

[62]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[63]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.