sc-PDB: an Annotated Database of Druggable Binding Sites from the Protein Data Bank

The sc-PDB is a collection of 6 415 three-dimensional structures of binding sites found in the Protein Data Bank (PDB). Binding sites were extracted from all high-resolution crystal structures in which a complex between a protein cavity and a small-molecular-weight ligand could be identified. Importantly, ligands are considered from a pharmacological and not a structural point of view. Therefore, solvents, detergents, and most metal ions are not stored in the sc-PDB. Ligands are classified into four main categories: nucleotides (< 4-mer), peptides (< 9-mer), cofactors, and organic compounds. The corresponding binding site is formed by all protein residues (including amino acids, cofactors, and important metal ions) with at least one atom within 6.5 angstroms of any ligand atom. The database was carefully annotated by browsing several protein databases (PDB, UniProt, and GO) and storing, for every sc-PDB entry, the following features: protein name, function, source, domain and mutations, ligand name, and structure. The repository of ligands has also been archived by diversity analysis of molecular scaffolds, and several chemoinformatics descriptors were computed to better understand the chemical space covered by stored ligands. The sc-PDB may be used for several purposes: (i) screening a collection of binding sites for predicting the most likely target(s) of any ligand, (ii) analyzing the molecular similarity between different cavities, and (iii) deriving rules that describe the relationship between ligand pharmacophoric points and active-site properties. The database is periodically updated and accessible on the web at http://bioinfo-pharma.u-strasbg.fr/scPDB/.

[1]  Shaomeng Wang,et al.  An Extensive Test of 14 Scoring Functions Using the PDBbind Refined Set of 800 Protein-Ligand Complexes , 2004, J. Chem. Inf. Model..

[2]  Paul Labute,et al.  On the Perception of Molecules from 3D Atomic Coordinates , 2005, J. Chem. Inf. Model..

[3]  T. N. Bhat,et al.  The PDB data uniformity project , 2001, Nucleic Acids Res..

[4]  Thierry Langer,et al.  LigandScout: 3-D Pharmacophores Derived from Protein-Bound Ligands and Their Use as Virtual Screening Filters , 2005, J. Chem. Inf. Model..

[5]  Nikolay P Savchuk,et al.  Exploring the chemogenomic knowledge space with annotated chemical libraries. , 2004, Current opinion in chemical biology.

[6]  Richard D. Taylor,et al.  Modeling water molecules in protein-ligand docking using GOLD. , 2005, Journal of medicinal chemistry.

[7]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[8]  M Hendlich,et al.  LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[9]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[10]  T. N. Bhat,et al.  The Protein Data Bank: unifying the archive , 2002, Nucleic Acids Res..

[11]  Julia V Ponomarenko,et al.  Assigning new GO annotations to protein data bank sequences by combining structure and sequence homology , 2005, Proteins.

[12]  Ian Tickle,et al.  High-throughput protein crystallography and drug discovery. , 2004, Chemical Society reviews.

[13]  Janet M. Thornton,et al.  PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids , 2004, Nucleic Acids Res..

[14]  Valentin A. Ilyin,et al.  LigBase: a database of families of aligned ligand binding sites in known protein sequences and structures , 2002, Bioinform..

[15]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[16]  T Lengauer,et al.  The particle concept: placing discrete water molecules during protein‐ligand docking predictions , 1999, Proteins.

[17]  Vladimir A. Ivanisenko,et al.  PDBSite: a database of the 3D structure of protein functional sites , 2004, Nucleic Acids Res..

[18]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[19]  R. Abagyan,et al.  Pocketome via Comprehensive Identification and Classification of Ligand Binding Envelopes* , 2005, Molecular & Cellular Proteomics.

[20]  N. Paul,et al.  Recovering the true targets of specific ligands by virtual screening of the protein data bank , 2004, Proteins.

[21]  A. Hopkins,et al.  The druggable genome , 2002, Nature Reviews Drug Discovery.

[22]  A. W. Schüttelkopf,et al.  PRODRG: a tool for high-throughput crystallography of protein-ligand complexes. , 2004, Acta crystallographica. Section D, Biological crystallography.

[23]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. , 2001, Advanced drug delivery reviews.

[24]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[25]  Michael G. Lerner,et al.  Binding MOAD (Mother Of All Databases) , 2005, Proteins.

[26]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[27]  Sameer Velankar,et al.  E-MSD: an integrated data resource for bioinformatics , 2004, Nucleic Acids Res..

[28]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[29]  W. Patrick Walters,et al.  Filtering databases and chemical libraries , 2004, Molecular Diversity.

[30]  Zukang Feng,et al.  Ligand Depot: a data warehouse for ligands bound to macromolecules , 2004, Bioinform..

[31]  Janet M. Thornton,et al.  SCOPEC: a database of protein catalytic domains , 2004, ISMB/ECCB.

[32]  Gerhard Klebe,et al.  Relibase: design and development of a database for comprehensive analysis of protein-ligand interactions. , 2003, Journal of molecular biology.

[33]  G. Klebe,et al.  A new method to detect related function among proteins independent of sequence and fold homology. , 2002, Journal of molecular biology.

[34]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[35]  T. A. Jones,et al.  Databases in protein crystallography. , 1998, Acta crystallographica. Section D, Biological crystallography.