PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics

BackgroundThe number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity.ResultsHere we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0), for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes.ConclusionWe suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file.Availabilityhttp://paradox.harvard.edu/PDB-UF and http://bioinfo.pl/PDB-UF

[1]  P. Karp,et al.  Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers , 2005, Nucleic acids research.

[2]  John B. Anderson,et al.  CDD: a curated Entrez database of conserved domain alignments , 2003, Nucleic Acids Res..

[3]  Marcin von Grotthuss,et al.  Detecting distant homology with Meta-BASIC , 2004, Nucleic Acids Res..

[4]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank. , 1991, Nucleic acids research.

[5]  Yael Mandel-Gutfreund,et al.  Annotating nucleic acid-binding function based on protein structure. , 2003, Journal of molecular biology.

[6]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[7]  Eugene V Koonin,et al.  SPOUT: a class of methyltransferases that includes spoU and trmD RNA methylase superfamilies, and novel superfamilies of predicted prokaryotic RNA methylases. , 2002, Journal of molecular microbiology and biotechnology.

[8]  Andrew C. R. Martin PDBSprotEC: a Web-accessible database linking PDB chains to EC numbers via SwissProt , 2004, Bioinform..

[9]  B. Rost Enzyme function less conserved than anticipated. , 2002, Journal of molecular biology.

[10]  J L Sussman,et al.  Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. , 1998, Acta crystallographica. Section D, Biological crystallography.

[11]  Hye-Jin Yoon,et al.  Crystal structure of tRNA(m1G37)methyltransferase: insights into tRNA recognition , 2003, The EMBO journal.

[12]  John D. Westbrook,et al.  TargetDB: a target registration database for structural genomics projects , 2004, Bioinform..

[13]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[14]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[15]  Joseph M. Watts,et al.  Insights into catalysis by a knotted TrmD tRNA methyltransferase. , 2003, Journal of molecular biology.

[16]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[17]  Leszek Rychlewski,et al.  3D-Hit: fast structural comparison of proteins. , 2002, Applied bioinformatics.

[18]  Janet M. Thornton,et al.  SCOPEC: a database of protein catalytic domains , 2004, ISMB/ECCB.

[19]  B. Laubert,et al.  Structural analysis of a set of proteins resulting from a bacterial genomics project , 2005, Proteins.

[20]  Liang Tong,et al.  Structural and biochemical studies identify tobacco SABP2 as a methyl salicylate esterase and implicate it in plant innate immunity. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[22]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[23]  Charles DeLisi,et al.  Functional fingerprints of folds: evidence for correlated structure-function evolution. , 2003, Journal of molecular biology.