Development of the Knowledge-Based and Empirical Combined Scoring Algorithm (KECSA) To Score Protein-Ligand Interactions

We describe a novel knowledge-based protein-ligand scoring function that employs a new definition for the reference state, allowing us to relate a statistical potential to a Lennard-Jones (LJ) potential. In this way, the LJ potential parameters were generated from protein-ligand complex structural data contained in the Protein Databank (PDB). Forty-nine (49) types of atomic pairwise interactions were derived using this method, which we call the knowledge-based and empirical combined scoring algorithm (KECSA). Two validation benchmarks were introduced to test the performance of KECSA. The first validation benchmark included two test sets that address the training set and enthalpy/entropy of KECSA. The second validation benchmark suite included two large-scale and five small-scale test sets, to compare the reproducibility of KECSA, with respect to two empirical score functions previously developed in our laboratory (LISA and LISA+), as well as to other well-known scoring methods. Validation results illustrate that KECSA shows improved performance in all test sets when compared with other scoring methods, especially in its ability to minimize the root mean square error (RMSE). LISA and LISA+ displayed similar performance using the correlation coefficient and Kendall τ as the metric of quality for some of the small test sets. Further pathways for improvement are discussed for which would allow KECSA to be more sensitive to subtle changes in ligand structure.

[1]  Renxiao Wang,et al.  The PDBbind database: methodologies and updates. , 2005, Journal of medicinal chemistry.

[2]  Xiaoqin Zou,et al.  Inclusion of Solvation and Entropy in the Knowledge-Based Scoring Function for Protein-Ligand Interactions , 2010, J. Chem. Inf. Model..

[3]  J. Skolnick,et al.  A distance‐dependent atomic knowledge‐based potential for improved protein structure selection , 2001, Proteins.

[4]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[5]  Janet M. Thornton,et al.  BLEEP - potential of mean force describing protein-ligand interactions: II. Calculation of binding energies and comparison with experimental data , 1999, J. Comput. Chem..

[6]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[7]  G. V. Paolini,et al.  Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes , 1997, J. Comput. Aided Mol. Des..

[8]  Zheng Zheng,et al.  Ligand Identification Scoring Algorithm (LISA) , 2011, J. Chem. Inf. Model..

[9]  Thomas Lengauer,et al.  A fast flexible docking method using an incremental construction algorithm. , 1996, Journal of molecular biology.

[10]  Hans-Joachim Böhm,et al.  Prediction of binding constants of protein ligands: A fast method for the prioritization of hits obtained from de novo design or 3D database search programs , 1998, J. Comput. Aided Mol. Des..

[11]  Song Liu,et al.  A knowledge-based energy function for protein-ligand, protein-protein, and protein-DNA complexes. , 2005, Journal of medicinal chemistry.

[12]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[13]  G. Klebe,et al.  DrugScore(CSD)-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. , 2005, Journal of medicinal chemistry.

[14]  G. Klebe,et al.  Knowledge-based scoring function to predict protein-ligand interactions. , 2000, Journal of molecular biology.

[15]  I. Muegge Effect of ligand volume correction on PMF scoring , 2001, J. Comput. Chem..

[16]  E. Shakhnovich,et al.  SMoG: de Novo Design Method Based on Simple, Fast, and Accurate Free Energy Estimates. 1. Methodology and Supporting Evidence , 1996 .

[17]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998, J. Comput. Chem..

[18]  K. Dill,et al.  Statistical potentials extracted from protein structures: how accurate are they? , 1996, Journal of molecular biology.

[19]  Y. Martin,et al.  A general and fast scoring function for protein-ligand interactions: a simplified potential approach. , 1999, Journal of medicinal chemistry.

[20]  I. Muegge A knowledge-based scoring function for protein-ligand interactions: Probing the reference state , 2000 .

[21]  Renxiao Wang,et al.  Comparative evaluation of 11 scoring functions for molecular docking. , 2003, Journal of medicinal chemistry.

[22]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[23]  K. Dill,et al.  An iterative method for extracting energy-like quantities from protein structures. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[24]  G. Casari,et al.  Identification of native protein folds amongst a large number of incorrect models. The calculation of low energy conformations from potentials of mean force. , 1990, Journal of molecular biology.

[25]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[26]  Xiaoqin Zou,et al.  An iterative knowledge‐based scoring function to predict protein–ligand interactions: II. Validation of the scoring function , 2006, J. Comput. Chem..

[27]  Janet M. Thornton,et al.  BLEEP—potential of mean force describing protein–ligand interactions: II. Calculation of binding energies and comparison with experimental data , 1999 .

[28]  Luhua Lai,et al.  Further development and validation of empirical scoring functions for structure-based binding affinity prediction , 2002, J. Comput. Aided Mol. Des..

[29]  J. Kirkwood Statistical Mechanics of Fluid Mixtures , 1935 .

[30]  Gennady M Verkhivker,et al.  Molecular recognition of the inhibitor AG-1343 by HIV-1 protease: conformationally flexible docking by evolutionary programming. , 1995, Chemistry & biology.

[31]  Zheng Zheng,et al.  Prediction of trypsin/molecular fragment binding affinities by free energy decomposition and empirical scores , 2012, Journal of Computer-Aided Molecular Design.

[32]  Brian K. Shoichet,et al.  Statistical Potential for Modeling and Ranking of Protein-Ligand Interactions , 2011, J. Chem. Inf. Model..

[33]  Hans-Joachim Böhm,et al.  The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure , 1994, J. Comput. Aided Mol. Des..

[34]  I. Kuntz,et al.  Automated docking with grid‐based energy evaluation , 1992 .

[35]  E. Shakhnovich,et al.  SMall Molecule Growth 2001 (SMoG2001): an improved knowledge-based scoring function for protein-ligand interactions. , 2002, Journal of medicinal chemistry.

[36]  Janet M. Thornton,et al.  BLEEP—potential of mean force describing protein–ligand interactions: I. Generating potential , 1999 .