A knowledge-guided strategy for improving the accuracy of scoring functions in binding affinity prediction

BackgroundCurrent scoring functions are not very successful in protein-ligand binding affinity prediction albeit their popularity in structure-based drug designs. Here, we propose a general knowledge-guided scoring (KGS) strategy to tackle this problem. Our KGS strategy computes the binding constant of a given protein-ligand complex based on the known binding constant of an appropriate reference complex. A good training set that includes a sufficient number of protein-ligand complexes with known binding data needs to be supplied for finding the reference complex. The reference complex is required to share a similar pattern of key protein-ligand interactions to that of the complex of interest. Thus, some uncertain factors in protein-ligand binding may cancel out, resulting in a more accurate prediction of absolute binding constants.ResultsIn our study, an automatic algorithm was developed for summarizing key protein-ligand interactions as a pharmacophore model and identifying the reference complex with a maximal similarity to the query complex. Our KGS strategy was evaluated in combination with two scoring functions (X-Score and PLP) on three test sets, containing 112 HIV protease complexes, 44 carbonic anhydrase complexes, and 73 trypsin complexes, respectively. Our results obtained on crystal structures as well as computer-generated docking poses indicated that application of the KGS strategy produced more accurate predictions especially when X-Score or PLP alone did not perform well.ConclusionsCompared to other targeted scoring functions, our KGS strategy does not require any re-parameterization or modification on current scoring methods, and its application is not tied to certain systems. The effectiveness of our KGS strategy is in theory proportional to the ever-increasing knowledge of experimental protein-ligand binding data. Our KGS strategy may serve as a more practical remedy for current scoring functions to improve their accuracy in binding affinity prediction.

[1]  Campbell McInnes,et al.  Virtual screening strategies in drug discovery. , 2007, Current opinion in chemical biology.

[2]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on a Diverse Test Set , 2009, J. Chem. Inf. Model..

[3]  J. Aqvist,et al.  A new method for predicting binding affinity in computer-aided drug design. , 1994, Protein engineering.

[4]  R. Glen,et al.  Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation. , 1995, Journal of molecular biology.

[5]  Thomas Lengauer,et al.  POEM: Parameter Optimization Using Ensemble Methods: Application to Target Specific Scoring Functions , 2005, J. Chem. Inf. Model..

[6]  G. Klebe,et al.  DrugScore(CSD)-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. , 2005, Journal of medicinal chemistry.

[7]  Paul D Lyne,et al.  Structure-based virtual screening: an overview. , 2002, Drug discovery today.

[8]  Hans-Joachim Böhm,et al.  The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure , 1994, J. Comput. Aided Mol. Des..

[9]  C. Venkatachalam,et al.  LigScore: a novel scoring function for predicting binding affinities. , 2005, Journal of molecular graphics & modelling.

[10]  Todd J. A. Ewing,et al.  DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases , 2001, J. Comput. Aided Mol. Des..

[11]  Gilles Marcou,et al.  Optimizing Fragment and Scaffold Docking by Use of Molecular Interaction Fingerprints , 2007, J. Chem. Inf. Model..

[12]  Brian K. Shoichet,et al.  Virtual screening of chemical libraries , 2004, Nature.

[13]  Peter A. Kollman,et al.  FREE ENERGY CALCULATIONS : APPLICATIONS TO CHEMICAL AND BIOCHEMICAL PHENOMENA , 1993 .

[14]  W. L. Jorgensen Free energy calculations: a breakthrough for modeling organic chemistry in solution , 1989 .

[15]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[16]  Gabriele Cruciani,et al.  A Common Reference Framework for Analyzing/Comparing Proteins and Ligands. Fingerprints for Ligands And Proteins (FLAP): Theory and Application , 2007, J. Chem. Inf. Model..

[17]  Peter A. Kollman,et al.  Theory of macromolecule-ligand interactions , 1994 .

[18]  I. Muegge PMF scoring revisited. , 2006, Journal of medicinal chemistry.

[19]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[20]  Renxiao Wang,et al.  The PDBbind database: methodologies and updates. , 2005, Journal of medicinal chemistry.

[21]  Holger Gohlke,et al.  DrugScoreRNAKnowledge-Based Scoring Function To Predict RNA-Ligand Interactions , 2007, J. Chem. Inf. Model..

[22]  Yuan Zhao,et al.  Automatic Perception of Organic Molecules Based on Essential Structural Information , 2007, J. Chem. Inf. Model..

[23]  Luhua Lai,et al.  Further development and validation of empirical scoring functions for structure-based binding affinity prediction , 2002, J. Comput. Aided Mol. Des..

[24]  Eric J. Martin,et al.  AutoShim: Empirically Corrected Scoring Functions for Quantitative Docking with a Crystal Structure and IC50 Training Data , 2008, J. Chem. Inf. Model..

[25]  David S. Goodsell,et al.  Distributed automated docking of flexible ligands to proteins: Parallel applications of AutoDock 2.4 , 1996, J. Comput. Aided Mol. Des..

[26]  Matthew P. Repasky,et al.  Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. , 2006, Journal of medicinal chemistry.

[27]  G. V. Paolini,et al.  Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes , 1997, J. Comput. Aided Mol. Des..

[28]  Hege S. Beard,et al.  Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. , 2004, Journal of medicinal chemistry.

[29]  Luhua Lai,et al.  SCORE: A New Empirical Method for Estimating the Binding Affinity of a Protein-Ligand Complex , 1998 .

[30]  Ajay N. Jain Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. , 2003, Journal of medicinal chemistry.

[31]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[32]  Ralph Kühne,et al.  Model Selection Based on Structural Similarity-Method Description and Application to Water Solubility Prediction , 2006, J. Chem. Inf. Model..

[33]  I. Muegge A knowledge-based scoring function for protein-ligand interactions: Probing the reference state , 2000 .

[34]  Markus H. J. Seifert Optimizing the Signal-to-Noise Ratio of Scoring Functions for Protein-Ligand Docking , 2008, J. Chem. Inf. Model..

[35]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[36]  Xiaoqin Zou,et al.  An iterative knowledge‐based scoring function to predict protein–ligand interactions: II. Validation of the scoring function , 2006, J. Comput. Chem..

[37]  Abby L. Parrill,et al.  Rational drug design : novel methodology and practical applications , 1999 .

[38]  Reiji Teramoto,et al.  Supervised Scoring Models with Docked Ligand Conformations for Structure-Based Virtual Screening , 2007, J. Chem. Inf. Model..

[39]  Mark A. Murcko,et al.  Virtual screening : an overview , 1998 .

[40]  Gennady M Verkhivker,et al.  Molecular recognition of the inhibitor AG-1343 by HIV-1 protease: conformationally flexible docking by evolutionary programming. , 1995, Chemistry & biology.

[41]  D. C. Sullivan,et al.  AutoShim: Empirically Corrected Scoring Functions for Quantitative Docking with a Crystal Structure and IC50 Training Data. , 2008 .

[42]  Xun Li,et al.  Interpretation of the Binding Affinities of PTP1B Inhibitors with the MM-GB/SA Method and the X-Score Scoring Function , 2009, J. Chem. Inf. Model..

[43]  Thomas Lengauer,et al.  A fast flexible docking method using an incremental construction algorithm. , 1996, Journal of molecular biology.

[44]  Ingo Muegge Effect of ligand volume correction on PMF scoring , 2001, J. Comput. Chem..

[45]  T. Lybrand Ligand-protein docking and rational drug design. , 1995, Current Opinion in Structural Biology.

[46]  Ajay N. Jain Scoring noncovalent protein-ligand interactions: A continuous differentiable function tuned to compute binding affinities , 1996, J. Comput. Aided Mol. Des..

[47]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[48]  SHENG-YOU HUANG,et al.  An iterative knowledge‐based scoring function to predict protein–ligand interactions: I. Derivation of interaction potentials , 2006, J. Comput. Chem..

[49]  Markus H J Seifert,et al.  Targeted scoring functions for virtual screening. , 2009, Drug discovery today.

[50]  Didier Rognan,et al.  Comparative evaluation of eight docking tools for docking and virtual screening accuracy , 2004, Proteins.

[51]  G. Klebe,et al.  Knowledge-based scoring function to predict protein-ligand interactions. , 2000, Journal of molecular biology.

[52]  D. E. Clark,et al.  Flexible docking using tabu search and an empirical estimate of binding affinity , 1998, Proteins.

[53]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[54]  Suzanne C Brewerton,et al.  The use of protein-ligand interaction fingerprints in docking. , 2008, Current opinion in drug discovery & development.

[55]  Luhua Lai,et al.  LigBuilder: A Multi-Purpose Program for Structure-Based Drug Design , 2000 .

[56]  D. J. Price,et al.  Assessing scoring functions for protein-ligand interactions. , 2004, Journal of medicinal chemistry.

[57]  Ajay N. Jain Surflex-Dock 2.1: Robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search , 2007, J. Comput. Aided Mol. Des..

[58]  P. Kollman,et al.  Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. , 2000, Accounts of chemical research.

[59]  John B. O. Mitchell,et al.  Predicting protein-ligand binding affinities: a low scoring game? , 2004, Organic & biomolecular chemistry.

[60]  E. Fischer Einfluss der Configuration auf die Wirkung der Enzyme , 1894 .

[61]  Jing Chen,et al.  Pocket v.2: Further Developments on Receptor-Based Pharmacophore Modeling , 2006, J. Chem. Inf. Model..

[62]  W. L. Jorgensen,et al.  AN EXTENDED LINEAR RESPONSE METHOD FOR DETERMINING FREE ENERGIES OF HYDRATION , 1995 .

[63]  C. Venkatachalam,et al.  LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites. , 2003, Journal of molecular graphics & modelling.

[64]  D. Goodsell,et al.  Automated docking of substrates to proteins by simulated annealing , 1990, Proteins.

[65]  Shaomeng Wang,et al.  An Extensive Test of 14 Scoring Functions Using the PDBbind Refined Set of 800 Protein-Ligand Complexes , 2004, J. Chem. Inf. Model..

[66]  Yuan Zhao,et al.  Computation of Octanol-Water Partition Coefficients by Guiding an Additive Model with Knowledge , 2007, J. Chem. Inf. Model..

[67]  Marcel L Verdonk,et al.  General and targeted statistical potentials for protein–ligand interactions , 2005, Proteins.

[68]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998, J. Comput. Chem..

[69]  Y. Martin,et al.  A general and fast scoring function for protein-ligand interactions: a simplified potential approach. , 1999, Journal of medicinal chemistry.

[70]  Reiji Teramoto,et al.  Consensus Scoring with Feature Selection for Structure-Based Virtual Screening , 2008, J. Chem. Inf. Model..

[71]  Renxiao Wang,et al.  Comparative evaluation of 11 scoring functions for molecular docking. , 2003, Journal of medicinal chemistry.

[72]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[73]  Hans-Joachim Böhm,et al.  Prediction of binding constants of protein ligands: A fast method for the prioritization of hits obtained from de novo design or 3D database search programs , 1998, J. Comput. Aided Mol. Des..