Comprehensive identification of "druggable" protein ligand binding sites.

We have developed a new computational algorithm for de novo identification of protein-ligand binding pockets and performed a large-scale validation of the algorithm on two systematically collected datasets from all crystallographic structures in the Protein Data Bank (PDB). This algorithm, called DrugSite, takes a three-dimensional protein structure as input and returns the location, volume and shape of the putative small molecule binding sites by using a physical potential and without any knowledge about a potential ligand molecule. We validated this method using 17,126 binding sites from complexes and apo-structures from the PDB. Out of 5,616 binding sites from protein-ligand complexes, 98.8% were identified by predicted pockets. In proteins having known binding sites, 80.9% were predicted by the largest predicted pocket and 92.7% by the first two. The average ratio of predicted contact area to the total surface area of the protein was 4.7% for the predicted pockets. In only 1.2% of the cases, no "pocket density" was found at the ligand location. Further, 98.6% of 11,510 binding sites collected from apo-structures were predicted. The algorithm is accurate and fast enough to predict protein-ligand binding sites of uncharacterized protein structures, suggest new allosteric druggable pockets, evaluate druggability of protein-protein interfaces and prioritize molecular targets by druggability. Furthermore, the known and the predicted binding pockets for the proteome of a particular organism can be clustered into a "pocketome", that can be used for rapid evaluation of possible binding partners of a given chemical compound.

[1]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[2]  D. Levitt,et al.  POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. , 1992, Journal of molecular graphics.

[3]  Ruben Abagyan,et al.  ICM—A new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation , 1994, J. Comput. Chem..

[4]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[5]  M. Swindells,et al.  Protein clefts in molecular recognition and function. , 1996, Protein science : a publication of the Protein Society.

[6]  C. Frömmel,et al.  The automatic search for ligand binding sites in proteins of known three-dimensional structure using only geometric criteria. , 1996, Journal of molecular biology.

[7]  Ajay N. Jain,et al.  Automatic identification and representation of protein binding sites for molecular docking , 1997, Protein science : a publication of the Protein Society.

[8]  R Abagyan,et al.  Flexible protein–ligand docking by global energy optimization in internal coordinates , 1997, Proteins.

[9]  M Hendlich,et al.  LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[10]  H. Edelsbrunner,et al.  Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design , 1998, Protein science : a publication of the Protein Society.

[11]  Mark A. Murcko,et al.  Virtual screening : an overview , 1998 .

[12]  Jill E. Gready,et al.  Simple method for locating possible ligand binding sites on protein surfaces , 1999, J. Comput. Chem..

[13]  C. Lipinski Drug-like properties and the causes of poor solubility and poor permeability. , 2000, Journal of pharmacological and toxicological methods.

[14]  Pieter F. W. Stouten,et al.  Fast prediction and visualization of protein binding pockets with PASS , 2000, J. Comput. Aided Mol. Des..

[15]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[16]  Gerhard Klebe,et al.  Recent developments in structure-based drug design , 2000, Journal of Molecular Medicine.

[17]  P. Dean,et al.  Recent advances in structure-based rational drug design. , 2000, Current opinion in structural biology.

[18]  P. Willett,et al.  SuperStar: improved knowledge-based interaction fields for protein binding sites. , 2001, Journal of molecular biology.

[19]  R Abagyan,et al.  High-throughput docking for lead generation. , 2001, Current opinion in chemical biology.

[20]  J. Irwin,et al.  Lead discovery using molecular docking. , 2002, Current opinion in chemical biology.

[21]  Sandor Vajda,et al.  Computational mapping identifies the binding sites of organic solvents on proteins , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[22]  O. Lichtarge,et al.  Evolutionary predictions of binding surfaces and interactions. , 2002, Current opinion in structural biology.

[23]  G. Klebe,et al.  Identification and mapping of small-molecule binding sites in proteins: computational tools for structure-based drug design. , 2002, Farmaco.

[24]  Stephen R. Johnson,et al.  Molecular properties that influence the oral bioavailability of drug candidates. , 2002, Journal of medicinal chemistry.

[25]  Stephen Anderson,et al.  Structural genomics: shaping the future of drug design? , 2002, Drug discovery today.

[26]  Gerhard Klebe,et al.  Identification and Mapping of Small-Molecule Binding Sites in Proteins: Computational Tools for Structure-Based Drug Design. , 2002 .

[27]  W. Richards,et al.  Identification of ligand binding sites on proteins using a multi-scale approach. , 2002, Journal of the American Chemical Society.

[28]  Sandor Vajda,et al.  Improved mapping of protein binding sites , 2003, J. Comput. Aided Mol. Des..

[29]  S. J. Campbell,et al.  Ligand binding: functional site location, similarity and docking. , 2003, Current opinion in structural biology.

[30]  Olivier Lichtarge,et al.  Accurate and scalable identification of functional sites by evolutionary tracing , 2004, Journal of Structural and Functional Genomics.

[31]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..