Identification of the ligand binding sites on the molecular surface of proteins

Identification of protein biochemical functions based on their three‐dimensional structures is now required in the post–genome‐sequencing era. Ligand binding is one of the major biochemical functions of proteins, and thus the identification of ligands and their binding sites is the starting point for the function identification. Previously we reported our first trial on structure‐based function prediction, based on the similarity searches of molecular surfaces against the functional site database. Here we describe the extension of our first trial by expanding the search database to whole heteroatom binding sites appearing within the Protein Data Bank (PDB) with the new analysis protocol. In addition, we have determined the similarity threshold line, by using 10 structure pairs with solved free and complex structures. Finally, we extensively applied our method to newly determined hypothetical proteins, including some without annotations, and evaluated the performance of our methods.

[1]  Anirvan S. Nandy,et al.  Population Statistics , 1937, Nature.

[2]  M. L. Connolly Solvent-accessible surfaces of proteins and nucleic acids. , 1983, Science.

[3]  P. Kraulis A program to produce both detailed and schematic plots of protein structures , 1991 .

[4]  P. Willett,et al.  A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. , 1994, Journal of molecular biology.

[5]  D Fischer,et al.  Molecular surface representations by sparse critical points , 1994, Proteins.

[6]  David T. Jones,et al.  Protein superfamilles and domain superfolds , 1994, Nature.

[7]  W R Pearson,et al.  Using the FASTA program to search protein and DNA sequence databases. , 1994, Methods in molecular biology.

[8]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[9]  R. Nussinov,et al.  Molecular recognition via face center representation of a molecular surface. , 1996, Journal of Molecular Graphics.

[10]  J. Thornton,et al.  Protein recognition of adenylate: an example of a fuzzy recognition template. , 1996, Journal of molecular biology.

[11]  J. Thompson,et al.  Using CLUSTAL for multiple sequence alignments. , 1996, Methods in enzymology.

[12]  J M Thornton,et al.  Derivation of 3D coordinate templates for searching structural databases: Application to ser‐His‐Asp catalytic triads in the serine proteinases and lipases , 1996, Protein science : a publication of the Protein Society.

[13]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[14]  J. Thornton,et al.  Tess: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites , 1997, Protein science : a publication of the Protein Society.

[15]  C. Chothia,et al.  Population statistics of protein structures: lessons from structural classifications. , 1997, Current opinion in structural biology.

[16]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[17]  R. Nussinov,et al.  Molecular shape comparisons in searches for active sites and functional similarity. , 1998, Protein engineering.

[18]  N Go,et al.  Structural motif of phosphate-binding site common to various protein superfamilies: all-against-all structural comparison of protein-mononucleotide complexes. , 1999, Protein engineering.

[19]  G J Kleywegt,et al.  Recognition of spatial motifs in protein structures. , 1999, Journal of molecular biology.

[20]  Liisa Holm,et al.  DaliLite workbench for protein structure comparison , 2000, Bioinform..

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  Annabel E. Todd,et al.  From structure to function: Approaches and limitations , 2000, Nature Structural Biology.

[23]  S. Brenner A tour of structural genomics , 2001, Nature Reviews Genetics.

[24]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[25]  Tim J. P. Hubbard,et al.  SCOP database in 2002: refinements accommodate structural genomics , 2002, Nucleic Acids Res..

[26]  Kengo Kinoshita,et al.  Crystal structure of the conserved protein TT1542 from Thermus thermophilus HB8 , 2003, Protein science : a publication of the Protein Society.

[27]  K. Kinoshita,et al.  Identification of protein biochemical functions by similarity search using the molecular surface database eF‐site , 2003, Protein science : a publication of the Protein Society.

[28]  Janet M Thornton,et al.  A template search reveals mechanistic similarities and differences in β‐ketoacyl synthases (KAS) and related enzymes , 2003, Proteins.

[29]  Kap Lim,et al.  Structure of the YibK methyltransferase from Haemophilus influenzae (HI0766): A cofactor bound at a site formed by a knot , 2003, Proteins.

[30]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[31]  H. Wolfson,et al.  Recognition of Functional Sites in Protein Structures☆ , 2004, Journal of Molecular Biology.

[32]  O. Lichtarge,et al.  A family of evolution-entropy hybrid methods for ranking protein residues by importance. , 2004, Journal of molecular biology.

[33]  K. Kinoshita,et al.  Identification of protein functions from a molecular surface database, eF-site , 2004, Journal of Structural and Functional Genomics.

[34]  Kengo Kinoshita,et al.  eF-site and PDBjViewer: database and viewer for protein functional sites , 2004, Bioinform..