Quantitative Comparison of Genomic-Wide Protein Domain Distributions

Investigations into the origins and evolution of regulatory mechanisms require quantitative estimates of the abundance and co-occurrence of functional protein domains among distantly related genomes. Currently available databases, such as the SUPERFAMILY, are not designed for quantitative comparisons since they are built upon transcript and protein annotations provided by the various different genome annotation projects. Large biases are introduced by the differences in genome annotation protocols, which strongly depend on the availability of transcript information and well-annotated closely related organisms. Here we show that the combination of de novo gene predictors and subsequent HMM-based annotation of SCOP domains in the predicted peptides leads to consistent estimates with acceptable accuracy that in particular can be utilized for systematic studies of the evolution of protein domain occurrences and co-occurrences. As an application, we considered four major classes of DNA binding domains: zink-finger, leucine-zipper, winged-helix, and HMG-box. We found that different types of DNA binding domains systematically avoid each other throughout the evolution of Eukarya. In contrast, DNA binding domains belonging to the same superfamily readily co-occur in the same protein.

[1]  Yasuhiro Go,et al.  Similar numbers but different repertoires of olfactory receptor genes in humans and chimpanzees. , 2008, Molecular biology and evolution.

[2]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[3]  E. Bornberg-Bauer,et al.  How do new proteins arise? , 2010, Current opinion in structural biology.

[4]  Michael R. Green,et al.  Gene Expression , 1993, Progress in Gene Expression.

[5]  E. Koonin,et al.  The Impact of Comparative Genomics on Our Understanding of Evolution , 2000, Cell.

[6]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[7]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[8]  James R. Brown,et al.  Bmc Evolutionary Biology the Evolution of Core Proteins Involved in Microrna Biogenesis , 2022 .

[9]  Olivier Gascuel,et al.  Detection of new protein domains using co-occurrence: application to Plasmodium falciparum , 2009, Bioinform..

[10]  Katja Nowick,et al.  Rapid sequence and expression divergence suggest selection for novel function in primate-specific KRAB-ZNF genes. , 2010, Molecular biology and evolution.

[11]  S. Karlin,et al.  Finding the genes in genomic DNA. , 1998, Current opinion in structural biology.

[12]  Andrew D. Moore,et al.  Arrangements in the modular evolution of proteins. , 2008, Trends in biochemical sciences.

[13]  A. Grigoriev,et al.  Significant expansion of exon-bordering protein domains during animal proteome evolution , 2005, Nucleic acids research.

[14]  Sonja J. Prohaska,et al.  Innovation in gene regulation: the case of chromatin computation. , 2010, Journal of theoretical biology.

[15]  M. Kanehisa,et al.  Evolutionary history and functional implications of protein domains and their combinations in eukaryotes , 2007, Genome Biology.

[16]  Ashwini Bhasi,et al.  ExDom: an integrated database for comparative analysis of the exon–intron structures of protein domains in eukaryotes , 2008, Nucleic Acids Res..

[17]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[18]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[19]  S. Wuchty,et al.  Evolutionary cores of domain co-occurrence networks , 2005, BMC Evolutionary Biology.

[20]  S. Martínez-Calvillo,et al.  Gene Expression in Trypanosomatid Parasites , 2010, Journal of biomedicine & biotechnology.

[21]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[22]  E. Nimwegen Scaling Laws in the Functional Content of Genomes , 2003, physics/0307001.

[23]  N. Rajewsky,et al.  The evolution of gene regulation by transcription factors and microRNAs , 2007, Nature Reviews Genetics.

[24]  Yoshihito Niimura,et al.  Evolutionary dynamics of olfactory receptor genes in chordates: interaction between environments and genomic contents , 2009, Human Genomics.

[25]  M. Borodovsky,et al.  Gene identification in novel eukaryotic genomes by self-training algorithm , 2005, Nucleic acids research.

[26]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.