Evolution of protein domain promiscuity in eukaryotes.

Numerous eukaryotic proteins contain multiple domains. Certain domains show a tendency to occur in diverse domain architectures and can be considered "promiscuous." These promiscuous domains are, typically, involved in protein-protein interactions and play crucial roles in interaction networks, particularly those that contribute to signal transduction. A systematic comparative-genomic analysis of promiscuous domains in eukaryotes is described. Two quantitative measures of domain promiscuity are introduced and applied to the analysis of 28 genomes of diverse eukaryotes. Altogether, 215 domains are identified as strongly promiscuous. The fraction of promiscuous domains in animals is shown to be significantly greater than that in fungi or plants. Evolutionary reconstructions indicate that domain promiscuity is a volatile, relatively fast-changing feature of eukaryotic proteins, with few domains remaining promiscuous throughout the evolution of eukaryotes. Some domains appear to have attained promiscuity independently in different lineages, for example, animals and plants. It is proposed that promiscuous domains persist within a relatively small pool of evolutionarily stable domain combinations from which numerous rare architectures emerge during evolution. Domain promiscuity positively correlates with the number of experimentally detected domain interactions and with the strength of purifying selection affecting a domain. Thus, evolution of promiscuous domains seems to be constrained by the diversity of their interaction partners. The set of promiscuous domains is enriched for domains mediating protein-protein interactions that are involved in various forms of signal transduction, especially in the ubiquitin system and in chromatin. Thus, a limited repertoire of promiscuous domains makes a major contribution to the diversity and evolvability of eukaryotic proteomes and signaling networks.

[1]  E. Koonin,et al.  Three distinct modes of intron dynamics in the evolution of eukaryotes. , 2007, Genome research.

[2]  M. Kanehisa,et al.  Evolutionary history and functional implications of protein domains and their combinations in eukaryotes , 2007, Genome Biology.

[3]  S. Teichmann,et al.  The folding and evolution of multidomain proteins , 2007, Nature Reviews Molecular Cell Biology.

[4]  Jessica H. Fong,et al.  Modeling the evolution of protein domain architectures using maximum parsimony. , 2007, Journal of molecular biology.

[5]  C. Chothia,et al.  The generation of new protein functions by the combination of domains. , 2007, Structure.

[6]  G. Caetano-Anollés,et al.  Global phylogeny determined by the combination of protein domains in proteomes. , 2006, Molecular biology and evolution.

[7]  Igor B. Rogozin,et al.  Dollo parsimony and the reconstruction of genome evolution , 2006 .

[8]  Peer Bork,et al.  SMART 5: domains in the context of genomes and networks , 2005, Nucleic Acids Res..

[9]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[10]  Aoife McLysaght,et al.  Comparative Genomics, RECOMB 2005 International Workshop, RCG 2005, Dublin, Ireland, September 18-20, 2005, Proceedings , 2005, Comparative Genomics.

[11]  Hung D. Nguyen,et al.  New Maximum Likelihood Estimators for Eukaryotic Intron Evolution , 2005, PLoS Comput. Biol..

[12]  Rolf Olsen,et al.  Comparing the Dictyostelium and Entamoeba Genomes Reveals an Ancient Split in the Conosa Lineage , 2005, PLoS Comput. Biol..

[13]  L. Patthy,et al.  Modules, multidomain proteins and organismic complexity , 2005, The FEBS journal.

[14]  B. Snel,et al.  Genome trees and the nature of genome evolution. , 2005, Annual review of microbiology.

[15]  C. Orengo,et al.  Protein families and their evolution-a structural perspective. , 2005, Annual review of biochemistry.

[16]  David L. Steffen,et al.  The genome of the social amoeba Dictyostelium discoideum , 2005, Nature.

[17]  S. Teichmann,et al.  The relationship between domain duplication and recombination. , 2005, Journal of molecular biology.

[18]  Robert D. Finn,et al.  iPfam: visualization of protein?Cprotein interactions in PDB at domain and amino acid resolutions , 2005, Bioinform..

[19]  John B. Anderson,et al.  CDD: a Conserved Domain Database for protein classification , 2004, Nucleic Acids Res..

[20]  Miklós Csürös,et al.  Likely Scenarios of Intron Evolution , 2005, Comparative Genomics.

[21]  L. Aravind,et al.  Comparative analysis of apicomplexa and genomic diversity in eukaryotes. , 2004, Genome research.

[22]  C. Chothia,et al.  Structure, function and evolution of multidomain proteins. , 2004, Current opinion in structural biology.

[23]  Adam Godzik,et al.  Comparative analysis of protein domain organization. , 2004, Genome research.

[24]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[25]  E. Koonin,et al.  Coelomata and not Ecdysozoa: evidence from genome-wide phylogenetic analysis. , 2003, Genome research.

[26]  L. Patthy Modular Assembly of Genes and the Evolution of New Functions , 2003, Genetica.

[27]  Jodie J. Yin,et al.  A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes , 2004, Genome Biology.

[28]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[29]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[30]  T. Cavalier-smith,et al.  The root of the eukaryote tree pinpointed , 2003, Current Biology.

[31]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[32]  S. Blair Hedges,et al.  The origin and evolution of model organisms , 2002, Nature Reviews Genetics.

[33]  L. Hurst The Ka/Ks ratio: diagnosing the form of sequence evolution. , 2002, Trends in genetics : TIG.

[34]  N. Grishin,et al.  Genome trees and the tree of life. , 2002, Trends in genetics : TIG.

[35]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[36]  S. Wuchty Scale-free behavior in protein domain networks. , 2001, Molecular biology and evolution.

[37]  S. Teichmann,et al.  Domain combinations in archaeal, eubacterial and eukaryotic proteomes. , 2001, Journal of molecular biology.

[38]  E V Koonin,et al.  Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-molecule-binding domains. , 2001, Journal of molecular biology.

[39]  E V Koonin,et al.  Apoptotic molecular machinery: vastly increased complexity in vertebrates revealed by genome comparisons. , 2001, Science.

[40]  E. Koonin,et al.  The Impact of Comparative Genomics on Our Understanding of Evolution , 2000, Cell.

[41]  Douglas Jb Computer-assisted analysis of mixtures. , 2000 .

[42]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[43]  K. Hofmann,et al.  The modular nature of apoptotic signaling proteins , 1999, Cellular and Molecular Life Sciences CMLS.

[44]  S E Brenner,et al.  Distribution of protein folds in the three superkingdoms of life. , 1999, Genome research.

[45]  B. Snel,et al.  Genome phylogeny based on gene content , 1999, Nature Genetics.

[46]  Temple F. Smith,et al.  Comparison of the complete protein sets of worm and yeast: orthology and divergence. , 1998, Science.

[47]  Luciano Milanesi,et al.  The subclass approach for mutational spectrum analysis: application of the SEM algorithm. , 1998, Journal of theoretical biology.

[48]  P Schlattmann,et al.  Recent developments in computer-assisted analysis of mixtures. , 1998, Biometrics.

[49]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[50]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.

[51]  J W Fickett,et al.  Estimation of protein coding density in a corpus of DNA sequence data. , 1993, Nucleic acids research.

[52]  Bruce G. Lindsay,et al.  Computer-assisted analysis of mixtures (C.A.MAN) statistical algorithms , 1992 .

[53]  P Schlattmann,et al.  Computer-assisted analysis of mixtures (C.A.MAM): statistical algorithms. , 1992, Biometrics.

[54]  M. Nei,et al.  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. , 1986, Molecular biology and evolution.

[55]  J. Farris Phylogenetic Analysis Under Dollo's Law , 1977 .