A Bayesian statistical approach of improving knowledge‐based scoring functions for protein–ligand interactions

Knowledge‐based scoring functions are widely used for assessing putative complexes in protein–ligand and protein–protein docking and for structure prediction. Even with large training sets, knowledge‐based scoring functions face the inevitable problem of sparse data. Here, we have developed a novel approach for handling the sparse data problem that is based on estimating the inaccuracies in knowledge‐based scoring functions. This inaccuracy estimation is used to automatically weight the knowledge‐based scoring function with an alternative, force‐field‐based potential (FFP) that does not rely on training data and can, therefore, provide an improved approximation of the interactions between rare chemical groups. The current version of STScore, a protein–ligand scoring function using our method, achieves a binding mode prediction success rate of 91% on the set of 100 complexes by Wang et al., and a binding affinity correlation of 0.514 with the experimentally determined affinities in PDBbind. The method presented here may be used with other FFPs and other knowledge‐based scoring functions and can also be applied to protein–protein docking and protein structure prediction. © 2014 Wiley Periodicals, Inc.

[1]  Renxiao Wang,et al.  The PDBbind database: methodologies and updates. , 2005, Journal of medicinal chemistry.

[2]  Brian K Shoichet,et al.  Prediction of protein-ligand interactions. Docking and scoring: successes and gaps. , 2006, Journal of medicinal chemistry.

[3]  H. Scheraga,et al.  Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. , 1976, Macromolecules.

[4]  U. Singh,et al.  A NEW FORCE FIELD FOR MOLECULAR MECHANICAL SIMULATION OF NUCLEIC ACIDS AND PROTEINS , 1984 .

[5]  Janet M. Thornton,et al.  BLEEP—potential of mean force describing protein–ligand interactions: I. Generating potential , 1999 .

[6]  Xiaoqin Zou,et al.  An iterative knowledge‐based scoring function to predict protein–ligand interactions: II. Validation of the scoring function , 2006, J. Comput. Chem..

[7]  S. Altschul,et al.  Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[8]  J. Andrew Grant,et al.  A smooth permittivity function for Poisson–Boltzmann solvation methods , 2001, J. Comput. Chem..

[9]  G. Klebe,et al.  DrugScore(CSD)-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. , 2005, Journal of medicinal chemistry.

[10]  K. Dill,et al.  Statistical potentials extracted from protein structures: how accurate are they? , 1996, Journal of molecular biology.

[11]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[12]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[13]  A. Beyer,et al.  An improved pair potential to recognize native protein folds , 1994, Proteins.

[14]  Brian K. Shoichet,et al.  Statistical Potential for Modeling and Ranking of Protein-Ligand Interactions , 2011, J. Chem. Inf. Model..

[15]  Nathan A. Baker,et al.  Electrostatics of nanosystems: Application to microtubules and the ribosome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Renxiao Wang,et al.  Comparative evaluation of 11 scoring functions for molecular docking. , 2003, Journal of medicinal chemistry.

[17]  P. Kollman,et al.  An all atom force field for simulations of proteins and nucleic acids , 1986, Journal of computational chemistry.

[18]  Emil Alexov,et al.  Rapid grid‐based construction of the molecular surface and the use of induced surface charge to calculate reaction field energies: Applications to the molecular systems and geometric objects , 2002, J. Comput. Chem..

[19]  P. Kollman,et al.  Biomolecular simulations: recent developments in force fields, simulations of enzyme catalysis, protein-ligand, protein-protein, and protein-nucleic acid noncovalent interactions. , 2001, Annual review of biophysics and biomolecular structure.

[20]  K. Dill,et al.  An iterative method for extracting energy-like quantities from protein structures. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[21]  I. Kuntz,et al.  Automated docking with grid‐based energy evaluation , 1992 .

[22]  Shaomeng Wang,et al.  M-score: a knowledge-based potential scoring function accounting for protein atom mobility. , 2006, Journal of medicinal chemistry.

[23]  Xiaoqin Zou,et al.  Inclusion of Solvation and Entropy in the Knowledge-Based Scoring Function for Protein-Ligand Interactions , 2010, J. Chem. Inf. Model..

[24]  G. Klebe,et al.  Knowledge-based scoring function to predict protein-ligand interactions. , 2000, Journal of molecular biology.

[25]  M. Sippl,et al.  Helmholtz free energies of atom pair interactions in proteins. , 1996, Folding & design.

[26]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[27]  Natasja Brooijmans,et al.  Molecular recognition and docking algorithms. , 2003, Annual review of biophysics and biomolecular structure.

[28]  Y. Martin,et al.  A general and fast scoring function for protein-ligand interactions: a simplified potential approach. , 1999, Journal of medicinal chemistry.

[29]  A. Edmundson,et al.  Local and transmitted conformational changes on complexation of an anti-sweetener Fab. , 1994, Journal of molecular biology.

[30]  Paul D Lyne,et al.  Structure-based virtual screening: an overview. , 2002, Drug discovery today.

[31]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[32]  Janet E. Jones On the determination of molecular fields. —II. From the equation of state of a gas , 1924 .

[33]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998 .

[34]  Xiaoqin Zou,et al.  Advances and Challenges in Protein-Ligand Docking , 2010, International journal of molecular sciences.

[35]  M. Gilson,et al.  Calculation of protein-ligand binding affinities. , 2007, Annual review of biophysics and biomolecular structure.

[36]  SHENG-YOU HUANG,et al.  An iterative knowledge‐based scoring function to predict protein–ligand interactions: I. Derivation of interaction potentials , 2006, J. Comput. Chem..

[37]  P. Hajduk,et al.  Evaluation of PMF scoring in docking weak ligands to the FK506 binding protein. , 1999, Journal of medicinal chemistry.

[38]  Janet M. Thornton,et al.  BLEEP—potential of mean force describing protein–ligand interactions: II. Calculation of binding energies and comparison with experimental data , 1999 .

[39]  Song Liu,et al.  A knowledge-based energy function for protein-ligand, protein-protein, and protein-DNA complexes. , 2005, Journal of medicinal chemistry.

[40]  W. C. Still,et al.  Semianalytical treatment of solvation for molecular mechanics and dynamics , 1990 .

[41]  Yaoqi Zhou,et al.  An all‐atom knowledge‐based energy function for protein‐DNA threading, docking decoy discrimination, and prediction of transcription‐factor binding profiles , 2009, Proteins.

[42]  Xiaoqin Zou,et al.  Scoring functions and their evaluation methods for protein-ligand docking: recent advances and future directions. , 2010, Physical chemistry chemical physics : PCCP.