Identification of Pockets on Protein Surface to Predict Protein–Ligand Binding Sites

Proteins perform their biological functions in different cell processes mainly by interacting with other molecules such as other proteins, ligands, DNAs and RNAs etc. Not all but only parts of residues in proteins are involved in such interactions. Therefore, identification of these interacting residues on a protein is of great importance to understanding of protein functions. In the variety of interactions, the interactions between proteins and ligands have been widely studied in protein-ligand docking, in virtual screening and structure-based drug design etc. There exist a number of cavities or pocket sites on protein surface where small molecules might bind. Therefore, identification of such pocket sites is often the first step in protein ligand-binding site prediction. Many computational algorithms and tools have been developed in recent decades to predict protein-ligand binding site from identification of pockets on protein structures, such as POCKET (Levitt and Banaszak 1992), LIGSITE (Hendlich et al. 1997), CAST (Dundas et al. 2006; Binkowski et al. 2003), LIGSITECS/C (Huang and Schroeder 2006), PASS (Brady and Stouten 2000), Q-SiteFinder (Laurie and Jackson 2005), SURFNET (Laskowski 1995), Fpocket (Le Guilloux et al. 2009), GHECOM (Kawabata 2010), ConCavity (Capra et al. 2009), POCASA (Yu et al. 2010), PocketPicker (Weisel et al. 2007), SiteHound (Ghersi and Sanchez 2009; Hernandez et al. 2009) and so on. Some of these methods have been described in details in other chapters. Most of the existing methods for protein-ligand binding site prediction can be classified into two types: geometry-based and energy-based. The geometry-based methods can be further classified into grid-based, sphere-based and α-shape-based (Kawabata 2010; Yu et al. 2010). In the grid based methods, the protein structure is projected into a 3D grid and the grid points are categorized into different types such as “outside protein”, “inside protein” and “near protein surface” according to their positions related to the protein. Then those grid points not inside protein are clustered using some geometry attributes and those grids points at the pocket sites can be recognized in the end. LIGSITECS, GHECOM, PocketPicker and ConCavity are the representatives of such type. In LIGSITEcs, the grid points are categorized into three types: inside protein, near surface and in the solvent. For all the solvent points, a seven-direction scanning is applied. All the solvent grid points will be evaluated by the number of SSS (surface-solvent-surface) event it has, and if the grid point has more or equal than five such events, it normally locates at a pocket site point. LIGSITEcs will be explained in details in the next section. GHECOM also firstly projects the protein into a 3D grid, and the geometry attribute used in this method is mathematical morphology. It uses the theory of mathematical morphology to define the pocket region on protein surface. In mathematical morphology (Masuya and Doi 1995), there are four basic operations of dilation, erosion, opening and closing for a probe to define a pocket site. In ConCavity, a 3D grid is constructed to include the protein as well. Each grid point is evaluated and scored by the structural information and the evolutional information. In the end, the regions with many high-scoring grid points are considered to be pocket sites. In the sphere-based approaches, the common strategy is to fulfill the spheres on protein surface layer by layer and a cutting method is applied when fulfilling. The final pocket sites are that those regions which are in rich of such spheres. This kind of methods include SURFNET, PASS, PHECOM (Kawabata and Go 2007) and POCASA (Yu et al. 2010). Approaches based on α-shape include CAST and Fpocket. CAST computes the triangulations of the protein’s surface atoms and these triangulations are grouped by letting small sized ones flow towards the neighboring larger one. The pocket sites are the collection of empty triangles. Different from CAST, Fpocket uses the idea of α- sphere which is a sphere contacting four atoms on its boundary and containing no inside atom. The next step is to identify clusters of spheres close together and those clusters are potential pocket sites. In contrast to geometry-based methods, there are some methods which Q-SiteFinder (Laurie and Jackson 2005) aims to find pocket sites by computing the interaction energy between protein atoms and a small molecule probe. In Q-SiteFinder, layers of methyl (―CH3) probes are initialized on protein surface to calculate the van der Waals interaction energy between the protein atoms and the probes. Then the probes are clustered into many groups and are ranked by the total energy of probes. Those clusters with high energy will be the potential ligand binding sites. SiteHound (Ghersi and Sanchez 2009; Hernandez et al. 2009) is similar to Q-SiteFinder but it includes Lennard-Jones and electrostatics energy terms and uses different types of probes to calculate interaction energy. Table 2.1 briefly summarizes the category of these existing computational methods.

[1]  Dario Ghersi,et al.  SITEHOUND-web: a server for ligand binding site identification in protein structures , 2009, Nucleic Acids Res..

[2]  Bingding Huang,et al.  MetaPocket: a meta approach to improve protein ligand binding site prediction. , 2009, Omics : a journal of integrative biology.

[3]  Yong Zhou,et al.  Roll: a new algorithm for the detection of protein pockets and cavities with a rolling probe sphere , 2010, Bioinform..

[4]  Pieter F. W. Stouten,et al.  Fast prediction and visualization of protein binding pockets with PASS , 2000, J. Comput. Aided Mol. Des..

[5]  Takeshi Kawabata,et al.  Detection of pockets on protein surfaces using small and large probe spheres to find putative ligand binding sites , 2007, Proteins.

[6]  M Hendlich,et al.  LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[7]  T. Kawabata Detection of multiscale pockets on protein surfaces using mathematical morphology , 2010, Proteins.

[8]  M. L. Connolly Analytical molecular surface calculation , 1983 .

[9]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[10]  Vincent Le Guilloux,et al.  Fpocket: An open source platform for ligand pocket detection , 2009, BMC Bioinformatics.

[11]  Jie Liang,et al.  CASTp: Computed Atlas of Surface Topography of proteins , 2003, Nucleic Acids Res..

[12]  M. Schroeder,et al.  LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation , 2006, BMC Structural Biology.

[13]  G. Schneider,et al.  PocketPicker: analysis of ligand binding-sites with shape descriptors , 2007, Chemistry Central Journal.

[14]  M. Masuya,et al.  Detection and geometric modeling of molecular surfaces and cavities using digital mathematical morphological operations. , 1995, Journal of molecular graphics.

[15]  N. Ben-Tal,et al.  The ConSurf‐HSSP database: The mapping of evolutionary conservation among homologs onto PDB structures , 2004, Proteins.

[16]  D. Levitt,et al.  POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. , 1992, Journal of molecular graphics.

[17]  Jie Liang,et al.  CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues , 2006, Nucleic Acids Res..

[18]  Richard M. Jackson,et al.  Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites , 2005, Bioinform..

[19]  Mona Singh,et al.  Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure , 2009, PLoS Comput. Biol..

[20]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[21]  Dario Ghersi,et al.  EASYMIFS and SITEHOUND: a toolkit for the identification of ligand-binding sites in protein structures , 2009, Bioinform..

[22]  Yu Li,et al.  Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction , 2011, Bioinform..