CSAR Scoring Challenge Reveals the Need for New Concepts in Estimating Protein-Ligand Binding Affinity

The dG prediction accuracy by the Lead Finder docking software on the CSAR test set was characterized by R(2)=0.62 and rmsd=1.93 kcal/mol, and the method of preparation of the full-atom structures of the test set did not significantly affect the resulting accuracy of predictions. The primary factors determining the correlation between the predicted and experimental values were the van der Waals interactions and solvation effects. Those two factors alone accounted for R(2)=0.50. The other factors that affected the accuracy of predictions, listed in the order of decreasing importance, were the change of ligand's internal energy upon binding with protein, the electrostatic interactions, and the hydrogen bonds. It appears that those latter factors contributed to the independence of the prediction results from the method of full-atom structure preparation. Then, we turned our attention to the other factors that could potentially improve the scoring function in order to raise the accuracy of the dG prediction. It turned out that the ligand-centric factors, including Mw, cLogP, PSA, etc. or protein-centric factors, such as the functional class of protein, did not improve the prediction accuracy. Following that, we explored if the weak molecular interactions such as X-H...Ar, X-H...Hal, CO...Hal, C-H...X, stacking and π-cationic interactions (where X is N or O), that are generally of interest to the medicinal chemists despite their lack of proper molecular mechanical parametrization, could improve dG prediction. Our analysis revealed that out of these new interactions only CO...Hal is statistically significant for dG predictions using Lead FInder scoring function. Accounting for the CO...Hal interaction resulted in the reduction of the rmsd from 2.19 to 0.69 kcal/mol for the corresponding structures. The other weak interaction factors were not statistically significant and therefore irrelevant to the accuracy of dG prediction. On the basis of our findings from our participation in the CSAR scoring challenge we conclude that a significant increase of accuracy predictions necessitates breakthrough scoring approaches. We anticipate that the explicit accounting for water molecules, protein flexibility, and a more thermodynamically accurate method of dG calculation rather than single point energy calculation may lead to such breakthroughs.

[1]  Fedor N. Novikov,et al.  Lead finder: an approach to improve accuracy of protein-ligand docking, binding energy estimation, and virtual screening. , 2008, Journal of chemical information and modeling.

[2]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[3]  David S. Goodsell,et al.  A semiempirical free energy force field with charge‐based desolvation , 2007, J. Comput. Chem..

[4]  D. J. Price,et al.  Assessing scoring functions for protein-ligand interactions. , 2004, Journal of medicinal chemistry.

[5]  Matthew P. Repasky,et al.  Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. , 2006, Journal of medicinal chemistry.

[6]  Eric Westhof,et al.  Halogen bonds in biological molecules. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Gerhard Klebe,et al.  AffinDB: a freely accessible database of affinities for protein–ligand complexes from the PDB , 2005, Nucleic Acids Res..

[8]  Michael G. Lerner,et al.  Binding MOAD (Mother Of All Databases) , 2005, Proteins.

[9]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998 .

[10]  Jeffrey Skolnick,et al.  Assessment of programs for ligand binding affinity prediction , 2008, J. Comput. Chem..

[11]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[12]  I. Muegge PMF scoring revisited. , 2006, Journal of medicinal chemistry.

[13]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[14]  F Guarnieri,et al.  A self-consistent, microenvironment modulated screened coulomb potential approximation to calculate pH-dependent electrostatic effects in proteins. , 1999, Biophysical journal.

[15]  Renxiao Wang,et al.  The PDBbind database: methodologies and updates. , 2005, Journal of medicinal chemistry.