Predicting Protein-Ligand Binding Affinities Using Novel Geometrical Descriptors and Machine-Learning Methods

Inspired by the concept of knowledge-based scoring functions, a new quantitative structure-activity relationship (QSAR) approach is introduced for scoring protein-ligand interactions. This approach considers that the strength of ligand binding is correlated with the nature of specific ligand/binding site atom pairs in a distance-dependent manner. In this technique, atom pair occurrence and distance-dependent atom pair features are used to generate an interaction score. Scoring and pattern recognition results obtained using Kernel PLS (partial least squares) modeling and a genetic algorithm-based feature selection method are discussed.

[1]  Gerhard Klebe,et al.  Predicting binding modes, binding affinities and ‘hot spots’ for protein-ligand complexes using a knowledge-based scoring function , 2000 .

[2]  Mark J. Embrechts,et al.  Scientific Data Mining with StripMinerTM , 2001 .

[3]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[4]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[5]  E. Shakhnovich,et al.  SMoG: de Novo Design Method Based on Simple, Fast, and Accurate Free Energy Estimates. 1. Methodology and Supporting Evidence , 1996 .

[6]  Y. Martin,et al.  A general and fast scoring function for protein-ligand interactions: a simplified potential approach. , 1999, Journal of medicinal chemistry.

[7]  P J Goodford,et al.  Drug design by the method of receptor fit. , 1984, Journal of medicinal chemistry.

[8]  Janet M. Thornton,et al.  BLEEP—potential of mean force describing protein–ligand interactions: I. Generating potential , 1999 .

[9]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[10]  E. Shakhnovich,et al.  SMall Molecule Growth 2001 (SMoG2001): an improved knowledge-based scoring function for protein-ligand interactions. , 2002, Journal of medicinal chemistry.

[11]  Trevor Hastie,et al.  Multivariate adaptive regression splines. Discussions , 1991 .

[12]  G. Klebe,et al.  Knowledge-based scoring function to predict protein-ligand interactions. , 2000, Journal of molecular biology.

[13]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[14]  Garland R. Marshall,et al.  VALIDATE: A New Method for the Receptor-Based Prediction of Binding Affinities of Novel Ligands , 1996 .

[15]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.