Large-Scale Evolutionary Patterns of Protein Domain Distributions in Eukaryotes

The genomic inventory of protein domains is an important indicator of an organism’s regulatory and metabolic capabilities. Existing gene annotations, however, can be plagued by substantial ascertainment biases that make it difficult to obtain and compare quantitative domain data. We find that quantitative trends across the Eukarya can be investigated based on a combination of gene prediction and standard domain annotation pipelines. Species-specific training is required, however, to account for the genomic peculiarities in many lineages. In contrast to earlier studies we find wide-spread statistically significant avoidance of protein domains associated with distinct functional high-level gene-ontology terms. 1998 ACM Subject Classification J.3 Life and Medical Sciences

[1]  Erich Bornberg-Bauer,et al.  Genomic and Morphological Evidence Converge to Resolve the Enigma of Strepsiptera , 2013, Current Biology.

[2]  Erich Bornberg-Bauer,et al.  Dynamics and Adaptive Benefits of Protein Domain Emergence and Arrangements during Plant Genome Evolution , 2012, Genome biology and evolution.

[3]  Erich Bornberg-Bauer,et al.  The Dynamics and Evolutionary Potential of Domain Loss and Emergence , 2011, Molecular biology and evolution.

[4]  Sonja J. Prohaska,et al.  Evolution and Quantitative Comparison of Genome-Wide Protein Domain Distributions , 2011, Genes.

[5]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[6]  Gustavo Caetano-Anollés,et al.  The proteomic complexity and rise of the primordial ancestor of diversified life , 2011, BMC Evolutionary Biology.

[7]  S. Michaeli Trans-splicing in trypanosomes: machinery and its impact on the parasite transcriptome. , 2011, Future microbiology.

[8]  Cyrus Chothia,et al.  SUPERFAMILY 1.75 including a domain-centric gene ontology method , 2010, Nucleic Acids Res..

[9]  G. Theißen,et al.  MADS and more: transcription factors that shape the plant. , 2011, Methods in molecular biology.

[10]  Adam Godzik,et al.  Strong functional patterns in the evolution of eukaryotic genomes revealed by the reconstruction of ancestral protein domain repertoires , 2011, Genome Biology.

[11]  John R Yates,et al.  Mass spectrometry in high-throughput proteomics: ready for the big time , 2010, Nature Methods.

[12]  Sonja J. Prohaska,et al.  Innovation in gene regulation: the case of chromatin computation. , 2010, Journal of theoretical biology.

[13]  E. Bornberg-Bauer,et al.  How do new proteins arise? , 2010, Current opinion in structural biology.

[14]  Sonja J. Prohaska,et al.  Quantitative Comparison of Genomic-Wide Protein Domain Distributions , 2010, GCB.

[15]  Philip E. Bourne,et al.  The Evolutionary History of Protein Domains Viewed by Species Phylogeny , 2009, PloS one.

[16]  Jeffrey P. Mower,et al.  RNAi in Budding Yeast , 2009, Science.

[17]  A. Bateman,et al.  The evolution of protein domain families. , 2009, Biochemical Society transactions.

[18]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[19]  P. Myler,et al.  Histone acetylations mark origins of polycistronic transcription in Leishmania major , 2009, BMC Genomics.

[20]  Andrew D. Moore,et al.  Arrangements in the modular evolution of proteins. , 2008, Trends in biochemical sciences.

[21]  E. Shelest,et al.  Transcription factors in fungi. , 2008, FEMS microbiology letters.

[22]  Erik L. L. Sonnhammer,et al.  Predicting protein function from domain content , 2008, Bioinform..

[23]  S. Baldauf An overview of the phylogeny and diversity of eukaryotes , 2008 .

[24]  David Haussler,et al.  Using native and syntenically mapped cDNA alignments to improve de novo gene finding , 2008, Bioinform..

[25]  L. Aravind,et al.  Comparative genomics of transcription factors and chromatin proteins in parasitic protists and other eukaryotes. , 2008, International journal for parasitology.

[26]  Fangli Lu,et al.  cDNA sequences reveal considerable gene prediction inaccuracy in the Plasmodium falciparum genome , 2007, BMC Genomics.

[27]  Burkhard Morgenstern,et al.  Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources , 2006, BMC Bioinformatics.

[28]  C. Orengo,et al.  Protein families and their evolution-a structural perspective. , 2005, Annual review of biochemistry.

[29]  R. Doolittle,et al.  Phylogeny determined by protein domain content. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Mario Stanke,et al.  Gene prediction with a hidden Markov model and a new intron submodel , 2003, ECCB.

[31]  J. Schug,et al.  Predicting gene ontology functions from ProDom and CDD protein domains. , 2002, Genome research.

[32]  S. Teichmann,et al.  Domain combinations in archaeal, eubacterial and eukaryotic proteomes. , 2001, Journal of molecular biology.

[33]  E. Koonin,et al.  The Impact of Comparative Genomics on Our Understanding of Evolution , 2000, Cell.

[34]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[35]  S. Karlin,et al.  Finding the genes in genomic DNA. , 1998, Current opinion in structural biology.

[36]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.