Evolution and Quantitative Comparison of Genome-Wide Protein Domain Distributions

The metabolic and regulatory capabilities of an organism are implicit in its protein content. This is often hard to estimate, however, due to ascertainment biases inherent in the available genome annotations. Its complement of recognizable functional protein domains and their combinations convey essentially the same information and at the same time are much more readily accessible, although protein domain models trained for one phylogenetic group frequently fail on distantly related sequences. Pooling related domain models based on their GO-annotation in combination with de novo gene prediction methods provides estimates that seem to be less affected by phylogenetic biases. We show here for 18 diverse representatives from all eukaryotic kingdoms that a pooled analysis of the tendencies for co-occurrence or avoidance of protein domains is indeed feasible. This type of analysis can reveal general large-scale patterns in the domain co-occurrence and helps to identify lineage-specific variations in the evolution of protein domains. Somewhat surprisingly, we do not find strong ubiquitous patterns governing the evolutionary behavior of specific functional classes. Instead, there are strong variations between the major groups of Eukaryotes, pointing at systematic differences in their evolutionary constraints.

[1]  Cyrus Chothia,et al.  SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny , 2008, Nucleic Acids Res..

[2]  C. Ouzounis,et al.  Comparative genomics of transcriptional control in the human malaria parasite Plasmodium falciparum. , 2004, Genome research.

[3]  S. Baldauf An overview of the phylogeny and diversity of eukaryotes , 2008 .

[4]  Yi Xing,et al.  Assessing the impact of alternative splicing on domain interactions in the human proteome. , 2004, Journal of proteome research.

[5]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[6]  S. Karlin,et al.  Finding the genes in genomic DNA. , 1998, Current opinion in structural biology.

[7]  Andrew D. Moore,et al.  Arrangements in the modular evolution of proteins. , 2008, Trends in biochemical sciences.

[8]  Sebastian Maurer-Stroh,et al.  More Than 1,001 Problems with Protein Domain Databases: Transmembrane Regions, Signal Peptides and the Issue of Sequence Homology , 2010, PLoS Comput. Biol..

[9]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[10]  Fangli Lu,et al.  cDNA sequences reveal considerable gene prediction inaccuracy in the Plasmodium falciparum genome , 2007, BMC Genomics.

[11]  Adam Godzik,et al.  Strong functional patterns in the evolution of eukaryotic genomes revealed by the reconstruction of ancestral protein domain repertoires , 2011, Genome Biology.

[12]  Sonja J. Prohaska,et al.  Innovation in gene regulation: the case of chromatin computation. , 2010, Journal of theoretical biology.

[13]  M. Kanehisa,et al.  Evolutionary history and functional implications of protein domains and their combinations in eukaryotes , 2007, Genome Biology.

[14]  Philip E. Bourne,et al.  The Evolutionary History of Protein Domains Viewed by Species Phylogeny , 2009, PloS one.

[15]  Katja Nowick,et al.  Rapid sequence and expression divergence suggest selection for novel function in primate-specific KRAB-ZNF genes. , 2010, Molecular biology and evolution.

[16]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[17]  Aaron Klug,et al.  The discovery of zinc fingers and their applications in gene regulation and genome manipulation. , 2010, Annual review of biochemistry.

[18]  Tony Pawson,et al.  Eukaryotic Protein Domains as Functional Units of Cellular Evolution , 2009, Science Signaling.

[19]  Sonja J. Prohaska,et al.  Quantitative Comparison of Genomic-Wide Protein Domain Distributions , 2010, GCB.

[20]  Michael Kuperberg,et al.  Markov Models , 2019, Earthquake Statistical Analysis through Multi-state Modeling.

[21]  K. Shinozaki,et al.  Structures and evolutionary origins of plant-specific transcription factor DNA-binding domains. , 2008, Plant physiology and biochemistry : PPB.

[22]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[23]  Gustavo Caetano-Anollés,et al.  An evolutionarily structured universe of protein architecture. , 2003, Genome research.

[24]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[25]  E. Koonin,et al.  The Impact of Comparative Genomics on Our Understanding of Evolution , 2000, Cell.

[26]  Andrew D. Moore,et al.  Just how versatile are domains? , 2008, BMC Evolutionary Biology.

[27]  E. Koonin,et al.  Evolution of protein domain promiscuity in eukaryotes. , 2008, Genome research.

[28]  Cyrus Chothia,et al.  SUPERFAMILY 1.75 including a domain-centric gene ontology method , 2010, Nucleic Acids Res..

[29]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[30]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[31]  S. Wuchty,et al.  Evolutionary cores of domain co-occurrence networks , 2005, BMC Evolutionary Biology.

[32]  E. Bornberg-Bauer,et al.  How do new proteins arise? , 2010, Current opinion in structural biology.

[33]  C. Ponting,et al.  The natural history of protein domains. , 2002, Annual review of biophysics and biomolecular structure.