A new protein binding pocket similarity measure based on comparison of clouds of atoms in 3D: application to ligand prediction

BackgroundPredicting which molecules can bind to a given binding site of a protein with known 3D structure is important to decipher the protein function, and useful in drug design. A classical assumption in structural biology is that proteins with similar 3D structures have related molecular functions, and therefore may bind similar ligands. However, proteins that do not display any overall sequence or structure similarity may also bind similar ligands if they contain similar binding sites. Quantitatively assessing the similarity between binding sites may therefore be useful to propose new ligands for a given pocket, based on those known for similar pockets.ResultsWe propose a new method to quantify the similarity between binding pockets, and explore its relevance for ligand prediction. We represent each pocket by a cloud of atoms, and assess the similarity between two pockets by aligning their atoms in the 3D space and comparing the resulting configurations with a convolution kernel. Pocket alignment and comparison is possible even when the corresponding proteins share no sequence or overall structure similarities. In order to predict ligands for a given target pocket, we compare it to an ensemble of pockets with known ligands to identify the most similar pockets. We discuss two criteria to evaluate the performance of a binding pocket similarity measure in the context of ligand prediction, namely, area under ROC curve (AUC scores) and classification based scores. We show that the latter is better suited to evaluate the methods with respect to ligand prediction, and demonstrate the relevance of our new binding site similarity compared to existing similarity measures.ConclusionsThis study demonstrates the relevance of the proposed method to identify ligands binding to known binding pockets. We also provide a new benchmark for future work in this field. The new method and the benchmark are available at http://cbio.ensmp.fr/paris/.

[1]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[2]  Andrew E. Torda,et al.  The GROMOS biomolecular simulation program package , 1999 .

[3]  Zhengyou Zhang,et al.  Iterative point matching for registration of free-form curves and surfaces , 1994, International Journal of Computer Vision.

[4]  E. Kellenberger,et al.  A simple and fuzzy method to align and compare druggable ligand‐binding sites , 2008, Proteins.

[5]  Tony Jebara,et al.  A Kernel Between Sets of Vectors , 2003, ICML.

[6]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[7]  Ruth Nussinov,et al.  SiteEngines: recognition and comparison of binding sites and protein–protein interfaces , 2005, Nucleic Acids Res..

[8]  S. B. Needleman,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 1989 .

[9]  Ioannis Pratikakis,et al.  Efficient 3D shape matching and retrieval using a concrete radialized spherical projection representation , 2007, Pattern Recognit..

[10]  Janet M. Thornton,et al.  Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons , 2005, Bioinform..

[11]  Dietmar Saupe,et al.  3D Model Retrieval with Spherical Harmonics and Moments , 2001, DAGM-Symposium.

[12]  Richard M. Jackson,et al.  Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites , 2005, Bioinform..

[13]  H. Wolfson,et al.  Recognition of Functional Sites in Protein Structures☆ , 2004, Journal of Molecular Biology.

[14]  Ruth Nussinov,et al.  MultiBind and MAPPIS: webservers for multiple alignment of protein 3D-binding sites and their interactions , 2008, Nucleic Acids Res..

[15]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[16]  Kanti V. Mardia,et al.  The Poisson Index: a new probabilistic model for protein-ligand binding site similarity , 2007, Bioinform..

[17]  Nicola D. Gold,et al.  SitesBase: a database for structure-based protein–ligand binding site comparisons , 2005, Nucleic Acids Res..

[18]  Janet M. Thornton,et al.  Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites , 2008, ECCB.

[19]  Bernd Radig,et al.  Proceedings of the 23rd DAGM-Symposium on Pattern Recognition , 2001 .

[20]  Eyke Hüllermeier,et al.  Multiple Graph Alignment for the Structural Analysis of Protein Active Sites , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Karthik Ramani,et al.  Three-dimensional shape searching: state-of-the-art review and future trends , 2005, Comput. Aided Des..

[22]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[23]  J. Thornton,et al.  A method for localizing ligand binding pockets in protein structures , 2005, Proteins.

[24]  Philip E. Bourne,et al.  A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery , 2009, Bioinform..

[25]  Peter Willett,et al.  Implementation of nearest-neighbor searching in an online chemical structure search system , 1986, J. Chem. Inf. Comput. Sci..

[26]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[27]  J. Thornton,et al.  Shape variation in protein binding pockets and their ligands. , 2007, Journal of molecular biology.

[29]  Silvia Biasotti,et al.  3D Shape Matching through Topological Structures , 2003, DGCI.