Supervised Consensus Scoring for Docking and Virtual Screening

Docking programs are widely used to discover novel ligands efficiently and can predict protein-ligand complex structures with reasonable accuracy and speed. However, there is an emerging demand for better performance from the scoring methods. Consensus scoring (CS) methods improve the performance by compensating for the deficiencies of each scoring function. However, conventional CS and existing scoring functions have the same problems, such as a lack of protein flexibility, inadequate treatment of salvation, and the simplistic nature of the energy function used. Although there are many problems in current scoring functions, we focus our attention on the incorporation of unbound ligand conformations. To address this problem, we propose supervised consensus scoring (SCS), which takes into account protein-ligand binding process using unbound ligand conformations with supervised learning. An evaluation of docking accuracy for 100 diverse protein-ligand complexes shows that SCS outperforms both CS and 11 scoring functions (PLP, F-Score, LigScore, DrugScore, LUDI, X-Score, AutoDock, PMF, G-Score, ChemScore, and D-score). The success rates of SCS range from 89% to 91% in the range of rmsd < 2 A, while those of CS range from 80% to 85%, and those of the scoring functions range from 26% to 76%. Moreover, we also introduce a method for judging whether a compound is active or inactive with the appropriate criterion for virtual screening. SCS performs quite well in docking accuracy and is presumably useful for screening large-scale compound databases before predicting binding affinity.

[1]  Janet M. Thornton,et al.  BLEEP—potential of mean force describing protein–ligand interactions: I. Generating potential , 1999 .

[2]  Anna Vulpetti,et al.  Assessment of Docking Poses: Interactions-Based Accuracy Classification (IBAC) versus Crystal Structure Deviations. , 2004 .

[3]  E. Shakhnovich,et al.  SMoG: de Novo Design Method Based on Simple, Fast, and Accurate Free Energy Estimates. 1. Methodology and Supporting Evidence , 1996 .

[4]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998, J. Comput. Chem..

[5]  Shaomeng Wang,et al.  MCDOCK: A Monte Carlo simulation approach to the molecular docking problem , 1999, J. Comput. Aided Mol. Des..

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  M. Murcko,et al.  Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. , 1999, Journal of medicinal chemistry.

[8]  Maria Kontoyianni,et al.  Evaluation of library ranking efficacy in virtual screening , 2005, J. Comput. Chem..

[9]  Y. Martin,et al.  A general and fast scoring function for protein-ligand interactions: a simplified potential approach. , 1999, Journal of medicinal chemistry.

[10]  Colin McMartin,et al.  QXP: Powerful, rapid computer algorithms for structure-based drug design , 1997, J. Comput. Aided Mol. Des..

[11]  W Patrick Walters,et al.  A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance , 2004, Proteins.

[12]  Gerhard Klebe,et al.  Successful virtual screening for novel inhibitors of human carbonic anhydrase: strategy and experimental confirmation. , 2002, Journal of medicinal chemistry.

[13]  D. Rognan,et al.  Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations. , 2000, Journal of medicinal chemistry.

[14]  S. Vajda,et al.  Protein docking along smooth association pathways , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  J. Apostolakis,et al.  Exhaustive docking of molecular fragments with electrostatic solvation , 1999, Proteins.

[16]  Maria I. Zavodszky,et al.  Distilling the essential features of a protein surface for improving protein-ligand docking, scoring, and virtual screening , 2002, J. Comput. Aided Mol. Des..

[17]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[18]  Ajay N. Jain Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. , 2003, Journal of medicinal chemistry.

[19]  Thomas Lengauer,et al.  A fast flexible docking method using an incremental construction algorithm. , 1996, Journal of molecular biology.

[20]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[21]  Maria Kontoyianni,et al.  Evaluation of docking performance: comparative data on docking algorithms. , 2004, Journal of medicinal chemistry.

[22]  J. Onuchic,et al.  Funnels, pathways, and the energy landscape of protein folding: A synthesis , 1994, Proteins.

[23]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[24]  R. Clark,et al.  Consensus scoring for ligand/protein interactions. , 2002, Journal of molecular graphics & modelling.

[25]  J M Blaney,et al.  A geometric approach to macromolecule-ligand interactions. , 1982, Journal of molecular biology.

[26]  Renxiao Wang,et al.  Comparative evaluation of 11 scoring functions for molecular docking. , 2003, Journal of medicinal chemistry.

[27]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[28]  Ruben Abagyan,et al.  ICM—A new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation , 1994, J. Comput. Chem..

[29]  Tingjun Hou,et al.  Automated docking of peptides and proteins by using a genetic algorithm combined with a tabu search. , 1999, Protein engineering.

[30]  Yuan-Ping Pang,et al.  Successful virtual screening of a chemical database for farnesyltransferase inhibitor leads. , 2000, Journal of medicinal chemistry.

[31]  Richard D. Taylor,et al.  Virtual Screening Using Protein—Ligand Docking: Avoiding Artificial Enrichment. , 2004 .

[32]  Protein-Ligand Interactions,et al.  Knowledge-based Scoring Function to Predict , 2000 .

[33]  M Rarey,et al.  Detailed analysis of scoring functions for virtual screening. , 2001, Journal of medicinal chemistry.

[34]  A. N. Jain,et al.  Hammerhead: fast, fully automated docking of flexible ligands to protein binding sites. , 1996, Chemistry & biology.

[35]  Didier Rognan,et al.  Comparative evaluation of eight docking tools for docking and virtual screening accuracy , 2004, Proteins.

[36]  D. Frank Hsu,et al.  Consensus Scoring Criteria for Improving Enrichment in Virtual Screening , 2005, J. Chem. Inf. Model..

[37]  Todd J. A. Ewing,et al.  DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases , 2001, J. Comput. Aided Mol. Des..

[38]  D. E. Clark,et al.  Flexible docking using tabu search and an empirical estimate of binding affinity , 1998, Proteins.

[39]  Nagarajan Vaidehi,et al.  HierVLS hierarchical docking protocol for virtual ligand screening of large-molecule databases. , 2004, Journal of medicinal chemistry.