On the detection of functionally coherent groups of protein domains with an extension to protein annotation

BackgroundProtein domains coordinate to perform multifaceted cellular functions, and domain combinations serve as the functional building blocks of the cell. The available methods to identify functional domain combinations are limited in their scope, e.g. to the identification of combinations falling within individual proteins or within specific regions in a translated genome. Further effort is needed to identify groups of domains that span across two or more proteins and are linked by a cooperative function. Such functional domain combinations can be useful for protein annotation.ResultsUsing a new computational method, we have identified 114 groups of domains, referred to as domain assembly units (DASSEM units), in the proteome of budding yeast Saccharomyces cerevisiae. The units participate in many important cellular processes such as transcription regulation, translation initiation, and mRNA splicing. Within the units the domains were found to function in a cooperative manner; and each domain contributed to a different aspect of the unit's overall function. The member domains of DASSEM units were found to be significantly enriched among proteins contained in transcription modules, defined as genes sharing similar expression profiles and presumably similar functions. The observation further confirmed the functional coherence of DASSEM units. The functional linkages of units were found in both functionally characterized and uncharacterized proteins, which enabled the assessment of protein function based on domain composition.ConclusionA new computational method was developed to identify groups of domains that are linked by a common function in the proteome of Saccharomyces cerevisiae. These groups can either lie within individual proteins or span across different proteins. We propose that the functional linkages among the domains within the DASSEM units can be used as a non-homology based tool to annotate uncharacterized proteins.

[1]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[2]  Christopher T. Workman,et al.  DASS: efficient discovery and p-value calculation of substructures in unordered data , 2007, Bioinform..

[3]  Huiming Ding,et al.  A Snf2 family ATPase complex required for recruitment of the histone H2A variant Htz1. , 2003, Molecular cell.

[4]  T. Pawson,et al.  Assembly of Cell Regulatory Systems Through Protein Interaction Domains , 2003, Science.

[5]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[6]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  I. Holland,et al.  ABC-ATPases, adaptable energy generators fuelling transmembrane movement of a variety of molecules in organisms from bacteria to humans. , 1999, Journal of molecular biology.

[8]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[9]  C. Chothia,et al.  Structure, function and evolution of multidomain proteins. , 2004, Current opinion in structural biology.

[10]  P A Weil,et al.  Identification of Two Novel TAF Subunits of the YeastSaccharomyces cerevisiae TFIID Complex* , 2000, The Journal of Biological Chemistry.

[11]  Gene H. Golub,et al.  Matrix Computations, Third Edition , 1996 .

[12]  Holland Ib,et al.  ABC-ATPases, adaptable energy generators fuelling transmembrane movement of a variety of molecules in organisms from bacteria to humans. , 1999 .

[13]  C. Orengo,et al.  Protein families and their evolution-a structural perspective. , 2005, Annual review of biochemistry.

[14]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[15]  W. Maret,et al.  Coordination dynamics of biological zinc "clusters" in metallothioneins and in the DNA-binding domain of the transcription factor Gal4. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[16]  J. Risler,et al.  Identification of genomic features using microsyntenies of domains: domain teams. , 2005, Genome research.

[17]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms , 2004, Nucleic Acids Res..

[18]  O Poch Conservation of a putative inhibitory domain in the GAL4 family members. , 1997, Gene.

[19]  I. Zhulin,et al.  PAS Domains: Internal Sensors of Oxygen, Redox Potential, and Light , 1999, Microbiology and Molecular Biology Reviews.

[20]  Henry Horng-Shing Lu,et al.  Statistical methods for identifying yeast cell cycle transcription factors. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  T. Endo,et al.  Possibility of cytoplasmic pre-tRNA splicing: the yeast tRNA splicing endonuclease mainly localizes on the mitochondria. , 2003, Molecular biology of the cell.

[22]  Hans A. Kestler,et al.  Generalized Venn diagrams: a new method of visualizing complex genetic set relations , 2005, Bioinform..

[23]  Rodney Rothstein,et al.  The Dun1 checkpoint kinase phosphorylates and regulates the ribonucleotide reductase inhibitor Sml1 , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[24]  B. Snel,et al.  Predicting gene function by conserved co-expression. , 2003, Trends in genetics : TIG.

[25]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[26]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[27]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[28]  J. Shabanowitz,et al.  A large nucleolar U3 ribonucleoprotein required for 18S ribosomal RNA biogenesis , 2002, Nature.

[29]  S E Brenner,et al.  Distribution of protein folds in the three superkingdoms of life. , 1999, Genome research.

[30]  Frédéric Barras,et al.  SufC: an unorthodox cytoplasmic ABC/ATPase required for [Fe—S] biogenesis under oxidative stress , 2003, The EMBO journal.

[31]  Anton J. Enright,et al.  Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions , 2001, Genome Biology.

[32]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[33]  Sven Bergmann,et al.  Defining transcription modules using large-scale gene expression data , 2004, Bioinform..

[34]  L. Poellinger,et al.  Role of the PAS Domain in Regulation of Dimerization and DNA Binding Specificity of the Dioxin Receptor , 1998, Molecular and Cellular Biology.

[35]  Young-Sun Lin,et al.  GAL4 derivatives function alone and synergistically with mammalian activators in vitro , 1988, Cell.

[36]  D. J. Stillman,et al.  Identification of the Saccharomyces cerevisiae genes STB1–STB5 encoding Sin3p binding proteins , 1997, Molecular and General Genetics MGG.

[37]  D. Botstein,et al.  Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth , 2000, Nature.

[38]  Eike Staub,et al.  The Highly Conserved LepA Is a Ribosomal Elongation Factor that Back-Translocates the Ribosome , 2006, Cell.

[39]  S. Bell,et al.  The origin recognition complex: from simple origins to complex functions. , 2002, Genes & development.

[40]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[41]  S. Wuchty,et al.  Evolutionary cores of domain co-occurrence networks , 2005, BMC Evolutionary Biology.

[42]  Michael Carey,et al.  DNA recognition by GAL4: structure of a protein-DNA complex , 1992, Nature.

[43]  S. Elledge,et al.  New yeast genes important for chromosome integrity and segregation identified by dosage effects on genome stability. , 1999, Nucleic acids research.

[44]  Andrew J. Link,et al.  Proteomics of the Eukaryotic Transcription Machinery: Identification of Proteins Associated with Components of Yeast TFIID by Multidimensional Mass Spectrometry , 2002, Molecular and Cellular Biology.

[45]  Stephen Dalton,et al.  Recruitment of Thr 319-phosphorylated Ndd1p to the FHA domain of Fkh2p requires Clb kinase activity: a mechanism for CLB cluster gene activation. , 2003, Genes & development.

[46]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[47]  James C Liao,et al.  Inferring yeast cell cycle regulators and interactions using transcription factor activities , 2005, BMC Genomics.

[48]  Elizabeth M. Boon,et al.  DNA-mediated charge transport for DNA repair , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[49]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[50]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[51]  G. Roeder,et al.  A meiosis-specific protein kinase homolog required for chromosome synapsis and recombination. , 1991, Genes & development.

[52]  M. Pellegrini Computational methods for protein function analysis. , 2001, Current opinion in chemical biology.

[53]  M J Cannon,et al.  A substrate recognition role for the [4Fe-4S]2+ cluster of the DNA repair glycosylase MutY. , 1998, Biochemistry.

[54]  E. Marcotte,et al.  Computational genetics: finding protein function by nonhomology methods. , 2000, Current opinion in structural biology.

[55]  Dong Xu,et al.  Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae. , 2004, Nucleic acids research.

[56]  S. Teichmann,et al.  Supra-domains: evolutionary units larger than single protein domains. , 2004, Journal of molecular biology.

[57]  Poethig Rs,et al.  Life with 25,000 genes. , 2001 .

[58]  Charles DeLisi,et al.  Predictome: a database of putative functional links between proteins , 2002, Nucleic Acids Res..

[59]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[60]  L. Drury,et al.  Mechanisms involved in regulating DNA replication origins during the cell cycle and in response to DNA damage. , 2004, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[61]  M. Gerstein,et al.  Annotation transfer for genomics: measuring functional divergence in multi-domain proteins. , 2001, Genome research.

[62]  Yan P. Yuan,et al.  Predicting function: from genes to genomes and back. , 1998, Journal of molecular biology.

[63]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[64]  T Gaasterland,et al.  Constructing multigenome views of whole microbial genomes. , 1998, Microbial & comparative genomics.

[65]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[66]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[67]  Michael Y. Galperin,et al.  Who's your neighbor? New computational approaches for functional genomics , 2000, Nature Biotechnology.

[68]  Karl Ekwall,et al.  Sin3: a flexible regulator of global gene expression and genome stability , 2004, Current Genetics.

[69]  Jussi Jäntti,et al.  Screening for novel essential genes of Saccharomyces cerevisiae involved in protein secretion , 2004, Yeast.

[70]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[71]  Sokal Rr,et al.  Biometry: the principles and practice of statistics in biological research 2nd edition. , 1981 .

[72]  Adam Godzik,et al.  Comparative analysis of protein domain organization. , 2004, Genome research.

[73]  Gene H. Golub,et al.  Matrix computations , 1983 .

[74]  Ioannis Xenarios,et al.  Mining literature for protein-protein interactions , 2001, Bioinform..

[75]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[76]  E. Izaurralde,et al.  Transport of macromolecules between the nucleus and the cytoplasm. , 1998, RNA.

[77]  P Linder,et al.  Dbp9p, a putative ATP-dependent RNA helicase involved in 60S-ribosomal-subunit biogenesis, functionally interacts with Dbp6p. , 2001, RNA.

[78]  M. Huynen,et al.  Prediction of protein function and pathways in the genome era , 2004, Cellular and Molecular Life Sciences CMLS.

[79]  C. Chothia,et al.  Evolution of the Protein Repertoire , 2003, Science.

[80]  L. Breeden,et al.  Cyclin transcription: Timing is everything , 2000, Current Biology.

[81]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[82]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[83]  Robert E. Lewis,et al.  Ras regulates assembly of mitogenic signalling complexes through the effector protein IMP , 2004, Nature.

[84]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[85]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[86]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[87]  Hong Li,et al.  tRNA Splicing* , 1998, The Journal of Biological Chemistry.

[88]  H. Stark,et al.  GTPase Mechanisms and Functions of Translation Factors on the Ribosome , 2000, Biological chemistry.

[89]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.