Functional diversity within protein superfamilies

Structural genomics projects are leading to the discovery o f elationships between proteins that would not have been anticipated from consideration of s equence alone. However the assignment of function via structure remains difficult as so me structures are compatible with a variety of functions. In this study we explore the rela tionships between structural diversity and functional diversity within distantly relat ed members of SCOP superfamilies. We use the Gene Ontology functional classification scheme an d Greens path entropy to measure functional diversity. We observe a negative correl ation between the functional entropy of a superfamily and the size of the conserved core.

[1]  Christopher D. Green,et al.  A Path Entropy Function for Rooted Trees , 1973, JACM.

[2]  Temple F. Smith,et al.  The statistical distribution of nucleic acid similarities. , 1985, Nucleic acids research.

[3]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[4]  A. Lesk,et al.  Determinants of a protein fold. Unique features of the globin amino acid sequences. , 1987, Journal of molecular biology.

[5]  S. B. Needleman,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 1989 .

[6]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[7]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[8]  Chris Sander,et al.  Dali/FSSP classification of three-dimensional protein folds , 1997, Nucleic Acids Res..

[9]  A. Sali 100,000 protein structures for the biologist , 1998, Nature Structural Biology.

[10]  M. Sternberg,et al.  Benchmarking PSI-BLAST in genome annotation. , 1999, Journal of molecular biology.

[11]  M. Gerstein,et al.  The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. , 1999, Journal of molecular biology.

[12]  S. Bryant,et al.  Identification of homologous core structures , 1999, Proteins.

[13]  W. Taylor Protein structure comparison using iterated double dynamic programming , 2008, Protein science : a publication of the Protein Society.

[14]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[15]  Alex Bateman,et al.  InterPro: An Integrated Documentation Resource for Protein Families, Domains and Functional Sites , 2002, Briefings Bioinform..

[16]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[17]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. , 2000, Journal of molecular biology.

[18]  Frances M. G. Pearl,et al.  The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues. , 2000, Protein engineering.

[19]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[20]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[21]  Jeff Shrager The fiction of function , 2003, Bioinform..

[22]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[23]  Boris E Shakhnovich,et al.  Quantifying structure-function uncertainty: a graph theoretical exploration into the origins and limitations of protein annotation. , 2004, Journal of molecular biology.

[24]  Janet M. Thornton,et al.  SCOPEC: a database of protein catalytic domains , 2004, ISMB/ECCB.

[25]  James A. Casbon,et al.  S4: structure-based sequence alignments of SCOP superfamilies , 2004, Nucleic Acids Res..

[26]  Boris E. Shakhnovich,et al.  Improving the Precision of the Structure–Function Relationship by Considering Phylogenetic Context , 2005, PLoS Comput. Biol..

[27]  Eric J. Deeds,et al.  Protein structure and evolutionary history determine sequence space topology. , 2004, Genome research.

[28]  Russell L. Marsden,et al.  Progress of structural genomics initiatives: an analysis of solved target structures. , 2005, Journal of molecular biology.