Annotation of Protein Domains Reveals Remarkable Conservation in the Functional Make up of Proteomes Across Superkingdoms

The functional repertoire of a cell is largely embodied in its proteome, the collection of proteins encoded in the genome of an organism. The molecular functions of proteins are the direct consequence of their structure and structure can be inferred from sequence using hidden Markov models of structural recognition. Here we analyze the functional annotation of protein domain structures in almost a thousand sequenced genomes, exploring the functional and structural diversity of proteomes. We find there is a remarkable conservation in the distribution of domains with respect to the molecular functions they perform in the three superkingdoms of life. In general, most of the protein repertoire is spent in functions related to metabolic processes but there are significant differences in the usage of domains for regulatory and extra-cellular processes both within and between superkingdoms. Our results support the hypotheses that the proteomes of superkingdom Eukarya evolved via genome expansion mechanisms that were directed towards innovating new domain architectures for regulatory and extra/intracellular process functions needed for example to maintain the integrity of multicellular structure or to interact with environmental biotic and abiotic factors (e.g., cell signaling and adhesion, immune responses, and toxin production). Proteomes of microbial superkingdoms Archaea and Bacteria retained fewer numbers of domains and maintained simple and smaller protein repertoires. Viruses appear to play an important role in the evolution of superkingdoms. We finally identify few genomic outliers that deviate significantly from the conserved functional design. These include Nanoarchaeum equitans, proteobacterial symbionts of insects with extremely reduced genomes, Tenericutes and Guillardia theta. These organisms spend most of their domains on information functions, including translation and transcription, rather than on metabolism and harbor a domain repertoire characteristic of parasitic organisms. In contrast, the functional repertoire of the proteomes of the Planctomycetes-Verrucomicrobia-Chlamydiae superphylum was no different than the rest of bacteria, failing to support claims of them representing a separate superkingdom. In turn, Protista and Bacteria shared similar functional distribution patterns suggesting an ancestral evolutionary link between these groups.

[1]  M. Riley,et al.  Protein evolution viewed through Escherichia coli protein sequences: introducing the notion of a structural segment of homology, the module. , 1997, Journal of molecular biology.

[2]  M. Gouy,et al.  Microsporidian Encephalitozoon cuniculi, a unicellular eukaryote with an unusual chromosomal dispersion of ribosomal genes and a LSU rRNA reduced to the universal core. , 1998, Nucleic acids research.

[3]  Harald Huber,et al.  A new phylum of Archaea represented by a nanosized hyperthermophilic symbiont , 2002, Nature.

[4]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2007: families and functions , 2006, Nucleic Acids Res..

[5]  Sandip Paul,et al.  Analysis of Nanoarchaeum equitans genome and proteome composition: indications for hyperthermophilic and parasitic adaptation , 2006, BMC Genomics.

[6]  M. Di Giulio The tree of life might be rooted in the branch leading to Nanoarchaeota. , 2007, Gene.

[7]  EDWIN C. Webb The Enzymes , 1961, Nature.

[8]  C. Slamovits,et al.  Causes and effects of nuclear genome reduction. , 2005, Current opinion in genetics & development.

[9]  Dieter Jahn,et al.  Nanoarchaeum equitans creates functional tRNAs from separate genes for their 5′- and 3′-halves , 2005, Nature.

[10]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[11]  Herrmann,et al.  Gene transfer from organelles to the nucleus: how much, what happens, and Why? , 1998, Plant Physiology.

[12]  T. Bürglin,et al.  Evolution of hedgehog and hedgehog-related genes, their origin from Hog proteins in ancestral eukaryotes and discovery of a novel Hint motif , 2008, BMC Genomics.

[13]  Olga K. Kamneva,et al.  Genome-Wide Influence of Indel Substitutions on Evolution of Bacteria of the PVC Superphylum, Revealed Using a Novel Computational Method , 2010, Genome biology and evolution.

[14]  Dieter Söll,et al.  Life without RNase P , 2008, Nature.

[15]  Michael Reith,et al.  The highly reduced genome of an enslaved algal nucleus , 2001, Nature.

[16]  C. Ponting,et al.  The natural history of protein domains. , 2002, Annual review of biophysics and biomolecular structure.

[17]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[18]  E. Rocha,et al.  The complete genome sequence of the murine respiratory pathogen Mycoplasma pulmonis. , 2001, Nucleic acids research.

[19]  S. Teichmann,et al.  The relationship between domain duplication and recombination. , 2005, Journal of molecular biology.

[20]  C. Chothia,et al.  Evolution of the Protein Repertoire , 2003, Science.

[21]  C. Chothia,et al.  Structure, function and evolution of multidomain proteins. , 2004, Current opinion in structural biology.

[22]  R. Sauer,et al.  Sequence space, folding and protein design. , 1996, Current opinion in structural biology.

[23]  E. Koonin,et al.  The Big Bang of picorna-like virus evolution antedates the radiation of eukaryotic supergroups , 2008, Nature Reviews Microbiology.

[24]  Gustavo Caetano-Anollés,et al.  Proteome Evolution and the Metabolic Origins of Translation and Cellular Life , 2010, Journal of Molecular Evolution.

[25]  L. Farinelli,et al.  The complete sequence of the smallest known nuclear genome from the microsporidian Encephalitozoon intestinalis , 2010, Nature communications.

[26]  Dieter Söll,et al.  The genome of Nanoarchaeum equitans: Insights into early archaeal evolution and derived parasitism , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  S. Teichmann,et al.  Supra-domains: evolutionary units larger than single protein domains. , 2004, Journal of molecular biology.

[28]  D. Caetano-Anollés,et al.  The origin, evolution and structure of the protein world. , 2009, The Biochemical journal.

[29]  D. Moreira,et al.  Ten reasons to exclude viruses from the tree of life , 2009, Nature Reviews Microbiology.

[30]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[31]  Fabienne Thomarat,et al.  Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi , 2001, Nature.

[32]  M. Gerstein,et al.  Comparing genomes in terms of protein structure: surveys of a finite parts list. , 1998, FEMS microbiology reviews.

[33]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[34]  Kevin Karplus,et al.  SAM-T08, HMM-based protein structure prediction , 2009, Nucleic Acids Res..

[35]  Gustavo Caetano-Anollés,et al.  The origin of modern metabolic networks inferred from phylogenomic analysis of protein architecture , 2007, Proceedings of the National Academy of Sciences.

[36]  Cyrus Chothia,et al.  Protein Family Expansions and Biological Complexity , 2006, PLoS Comput. Biol..

[37]  Hajime Ishikawa,et al.  The 160-Kilobase Genome of the Bacterial Endosymbiont Carsonella , 2006, Science.

[38]  D. Devos,et al.  Intermediate Steps , 2010, Science.

[39]  Josef D. Franke,et al.  The Compartmentalized Bacteria of the Planctomycetes-Verrucomicrobia-Chlamydiae Superphylum Have Membrane Coat-Like Proteins , 2010, PLoS biology.

[40]  J Craig Venter,et al.  Chemical synthesis of the mouse mitochondrial genome , 2010, Nature Methods.

[41]  B. L. Welch THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN TWO MEANS WHEN THE POPULATION VARIANCES ARE UNEQUAL , 1938 .

[42]  C R Woese,et al.  Phylogenetic analysis of the mycoplasmas. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Gustavo Caetano-Anollés,et al.  Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world. , 2007, Genome research.

[44]  Gustavo Caetano-Anollés,et al.  The proteomic complexity and rise of the primordial ancestor of diversified life , 2011, BMC Evolutionary Biology.

[45]  P. Ingham,et al.  Mechanisms and functions of Hedgehog signalling across the metazoa , 2011, Nature Reviews Genetics.

[46]  P. Forterre,et al.  Bacteria with a eukaryotic touch: A glimpse of ancient evolution? , 2010, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Gustavo Caetano-Anollés,et al.  Reductive evolution of proteomes and protein structures , 2011, Proceedings of the National Academy of Sciences.

[48]  M. Di Giulio Nanoarchaeum equitans is a living fossil. , 2006, Journal of theoretical biology.

[49]  Gustavo Caetano-Anollés,et al.  The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. , 2009, Structure.