Gene ontology functional annotations at the structural domain level

Most proteins are organized in domains which can be seen as independent modular units in terms of molecular function (MF). Nevertheless, current functional annotations are done on a “whole‐chain” basis without associating specific functions to the individual domains. We present here an automatic method for discerning which particular structural domain within a protein is responsible for a given MF originally attributed to the whole protein. By annotating the SCOP structural domains with gene ontology terms using this method, we obtained the first large‐scale functional annotation at the domain level. We performed a large‐scale comparison of these annotations with the ones implicit in the functional annotations of Interpro signatures, showing that the performance of this method is globally better. We also discuss in detail some particular examples. Generated automatically and available online, this resource could be the basis for future manually curated annotations. Proteins 2009. © 2009 Wiley‐Liss, Inc.

[1]  Geoffrey J. Barton,et al.  GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes , 2004, BMC Bioinformatics.

[2]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[3]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[4]  D. Barrell,et al.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. , 2003, Genome research.

[5]  S. Teichmann,et al.  Domain combinations in archaeal, eubacterial and eukaryotic proteomes. , 2001, Journal of molecular biology.

[6]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2007: families and functions , 2006, Nucleic Acids Res..

[7]  Sameer Velankar,et al.  E-MSD: the European Bioinformatics Institute Macromolecular Structure Database , 2003, Nucleic Acids Res..

[8]  Monica Riley Searchlight on domains. , 2007, Structure.

[9]  Janet M. Thornton,et al.  PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids , 2004, Nucleic Acids Res..

[10]  P Willett,et al.  Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. , 2002, Combinatorial chemistry & high throughput screening.

[11]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[12]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[13]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[14]  C. Chothia,et al.  The generation of new protein functions by the combination of domains. , 2007, Structure.