Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell.

Comparative analysis of the protein sequences encoded in the four euryarchaeal species whose genomes have been sequenced completely (Methanococcus jannaschii, Methanobacterium thermoautotrophicum, Archaeoglobus fulgidus, and Pyrococcus horikoshii) revealed 1326 orthologous sets, of which 543 are represented in all four species. The proteins that belong to these conserved euryarchaeal families comprise 31%-35% of the gene complement and may be considered the evolutionarily stable core of the archaeal genomes. The core gene set includes the great majority of genes coding for proteins involved in genome replication and expression, but only a relatively small subset of metabolic functions. For many gene families that are conserved in all euryarchaea, previously undetected orthologs in bacteria and eukaryotes were identified. A number of euryarchaeal synapomorphies (unique shared characters) were identified; these are protein families that possess sequence signatures or domain architectures that are conserved in all euryarchaea but are not found in bacteria or eukaryotes. In addition, euryarchaea-specific expansions of several protein and domain families were detected. In terms of their apparent phylogenetic affinities, the archaeal protein families split into bacterial and eukaryotic families. The majority of the proteins that have only eukaryotic orthologs or show the greatest similarity to their eukaryotic counterparts belong to the core set. The families of euryarchaeal genes that are conserved in only two or three species constitute a relatively mobile component of the genomes whose evolution should have involved multiple events of lineage-specific gene loss and horizontal gene transfer. Frequently these proteins have detectable orthologs only in bacteria or show the greatest similarity to the bacterial homologs, which might suggest a significant role of horizontal gene transfer from bacteria in the evolution of the euryarchaeota.

[1]  E. Koonin,et al.  Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. , 1999, Journal of molecular biology.

[2]  E V Koonin,et al.  DNA polymerase beta-like nucleotidyltransferase superfamily: identification of three new families, classification and evolutionary history. , 1999, Nucleic acids research.

[3]  E. Koonin,et al.  Conserved domains in DNA repair proteins and evolution of repair systems. , 1999, Nucleic acids research.

[4]  S E Brenner,et al.  Distribution of protein folds in the three superkingdoms of life. , 1999, Genome research.

[5]  S. Kim,et al.  Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  S. Bell,et al.  Temperature, template topology, and factor requirements of archaeal transcription. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Temple F. Smith,et al.  Comparison of the complete protein sets of worm and yeast: orthology and divergence. , 1998, Science.

[8]  E V Koonin,et al.  The HD domain defines a new superfamily of metal-dependent phosphohydrolases. , 1998, Trends in biochemical sciences.

[9]  H. Toh,et al.  A heterodimeric DNA polymerase: evidence that members of Euryarchaeota possess a distinct DNA polymerase. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[10]  C. Johnson,et al.  Expression of a gene cluster kaiABC as a circadian feedback process in cyanobacteria. , 1998, Science.

[11]  Detlef D. Leipe,et al.  Toprim--a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins. , 1998, Nucleic acids research.

[12]  E V Koonin,et al.  Phosphoesterase domains associated with DNA polymerases of diverse origins. , 1998, Nucleic acids research.

[13]  S. Bell,et al.  Transcription and translation in Archaea: a mosaic of eukaryal and bacterial features. , 1998, Trends in microbiology.

[14]  Michael Y. Galperin,et al.  Beyond complete genomes: from sequence to structure and function. , 1998, Current opinion in structural biology.

[15]  E. Noguchi,et al.  Human dis3p, which binds to either GTP- or GDP-Ran, complements Saccharomyces cerevisiae dis3. , 1998, Journal of biochemistry.

[16]  Chiaki Kato,et al.  Pyrococcus horikoshii sp. nov., a hyperthermophilic archaeon isolated from a hydrothermal vent at the Okinawa Trough , 1998, Extremophiles.

[17]  A. Ashcroft,et al.  The dhnA gene of Escherichia coli encodes a class I fructose bisphosphate aldolase. , 1998, The Biochemical journal.

[18]  K. Komori,et al.  Yosuke Archaea A Novel DNA Polymerase Family Found in , 1997 .

[19]  John M. Logsdon,et al.  Archaeal genomics: Do archaea have a mixed heritage? , 1998, Current Biology.

[20]  R. St-Arnaud,et al.  The Alpha Chain of the Nascent Polypeptide-Associated Complex Functions as a Transcriptional Coactivator , 1998, Molecular and Cellular Biology.

[21]  N. Kyrpides,et al.  Universally conserved translation initiation factors. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Y. Kawarabayasi,et al.  Complete sequence and gene organization of the genome of a hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3 (supplement). , 1998, DNA research : an international journal for rapid publication of reports on genes and genomes.

[23]  F. Robb,et al.  Complete sequence and gene organization of the genome of a hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3. , 1998, DNA research : an international journal for rapid publication of reports on genes and genomes.

[24]  L. Aravind,et al.  An evolutionary classification of the metallo-beta-lactamase fold proteins , 1998, Silico Biol..

[25]  T. Nyström,et al.  The universal stress protein, UspA, of Escherichia coli is phosphorylated in response to stasis. , 1997, Journal of molecular biology.

[26]  Michael Y. Galperin,et al.  Prokaryotic genomes: the emerging paradigm of genome-based microbiology. , 1997, Current opinion in genetics & development.

[27]  James R. Brown,et al.  Archaea and the prokaryote-to-eukaryote transition. , 1997, Microbiology and molecular biology reviews : MMBR.

[28]  R. Fleischmann,et al.  The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus , 1997, Nature.

[29]  R F Doolittle,et al.  Determining divergence times with a protein clock: update and reevaluation. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[30]  M. Mann,et al.  The Exosome: A Conserved Eukaryotic RNA Processing Complex Containing Multiple 3′→5′ Exoribonucleases , 1997, Cell.

[31]  G. Church,et al.  Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: functional analysis and comparative genomics , 1997, Journal of bacteriology.

[32]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[33]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[34]  L. Spremulli,et al.  Role of Domains in Escherichia coli and Mammalian Mitochondrial Elongation Factor Ts in the Interaction with Elongation Factor Tu* , 1997, The Journal of Biological Chemistry.

[35]  Michael Y. Galperin,et al.  Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea , 1997, Molecular microbiology.

[36]  H. Doi,et al.  A novel DNA polymerase in the hyperthermophilic archaeon, Pyrococcus furiosus: gene cloning, expression, and characterization , 1997, Genes to cells : devoted to molecular & cellular mechanisms.

[37]  W. Doolittle,et al.  Archaea and the Origin(s) of DNA Replication Proteins , 1997, Cell.

[38]  Eugene V. Koonin,et al.  SEALS: A System for Easy Analysis of Lots of Sequences , 1997, ISMB.

[39]  M. Riley,et al.  Protein evolution viewed through Escherichia coli protein sequences: introducing the notion of a structural segment of homology, the module. , 1997, Journal of molecular biology.

[40]  N. Pace A molecular view of microbial diversity and the biosphere. , 1997, Science.

[41]  A. Bacher,et al.  Biosynthesis of riboflavin: an unusual riboflavin synthase of Methanobacterium thermoautotrophicum , 1997, Journal of bacteriology.

[42]  E V Koonin,et al.  Evidence for a Family of Archaeal ATPases , 1997, Science.

[43]  C. Ponting CBS domains in CIC chloride channels implicated in myotonia and nephrolithiasis (kidney stones). , 1997, Journal of molecular medicine.

[44]  A. Bateman The structure of a domain common to archaebacteria and the homocystinuria disease protein. , 1997, Trends in biochemical sciences.

[45]  M. Yanagida,et al.  Dis3, implicated in mitotic control, binds directly to Ran and enhances the GEF activity of RCC1. , 1996, The EMBO journal.

[46]  E. Koonin,et al.  A minimal gene set for cellular life derived by comparison of complete bacterial genomes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[47]  P. Bork,et al.  Non-orthologous gene displacement. , 1996, Trends in genetics : TIG.

[48]  R. Fleischmann,et al.  Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii , 1996, Science.

[49]  W. Purschke,et al.  Respiratory chains of archaea and extremophiles. , 1996, Biochimica et Biophysica Acta.

[50]  W. Purschke,et al.  On the origin of respiration: electron transport proteins from archaea to man. , 1996, FEMS microbiology reviews.

[51]  Griffiths,et al.  Biomaterials and Granulomas , 1996, Methods.

[52]  T. Powers,et al.  The nascent polypeptide-associated complex modulates interactions between the signal recognition particle and the ribosome , 1996, Current Biology.

[53]  P. Bork,et al.  Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli , 1996, Current Biology.

[54]  Michael Wulff,et al.  The structure of the Escherichia coli EF-Tu· EF-Ts complex at 2.5 Å resolution , 1996, Nature.

[55]  M C Peitsch,et al.  ProMod and Swiss-Model: Internet-based tools for automated comparative protein modelling. , 1996, Biochemical Society transactions.

[56]  F. Neidhart Escherichia coli and Salmonella. , 1996 .

[57]  Eugene V. Koonin,et al.  [18] Protein sequence comparison at genome scale , 1996 .

[58]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[59]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[60]  W. Wickner The nascent-polypeptide-associated complex: having a "NAC" for fidelity in translocation. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[61]  M. Wiedmann,et al.  NAC covers ribosome-associated nascent chains thereby forming a protective environment for regions of nascent chains just emerging from the peptidyl transferase center , 1995, The Journal of cell biology.

[62]  P. Thuriaux,et al.  Transcription in archaea: similarity to that in eucarya. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Jinya Otsuka,et al.  A comprehensive representation of extensive similarity linkage between large numbers of proteins , 1995, Comput. Appl. Biosci..

[64]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.

[65]  John Kuriyan,et al.  Crystal structure of the eukaryotic DNA polymerase processivity factor PCNA , 1994, Cell.

[66]  W. Doolittle,et al.  Archaebacterial genomes: eubacterial form and eukaryotic content. , 1994, Current opinion in genetics & development.

[67]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[68]  S. Bryant,et al.  Eukaryotic translation elongation factor 1γ contains a glutathione transferase domain—Study of a diverse, ancient protein super family using motif search and structural modeling , 1994, Protein science : a publication of the Protein Society.

[69]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..

[70]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[71]  C. Woese There must be a prokaryote somewhere: microbiology's search for itself , 1994 .

[72]  R. Overbeek,et al.  The winds of (evolutionary) change: breathing new life into microbiology , 1994 .

[73]  P Bork,et al.  Evolutionarily mobile modules in proteins. , 1993, Scientific American.

[74]  D. Womble,et al.  Autoregulation of the stability operon of IncFII plasmid NR1 , 1992, Journal of bacteriology.

[75]  W. Zillig Comparative biochemistry of Archaea and Bacteria. , 1991, Current opinion in genetics & development.

[76]  M. Goebl,et al.  The fission yeast dis3+ gene encodes a 110-kDa essential protein implicated in mitotic control , 1991, Molecular and cellular biology.

[77]  C. Branden,et al.  Introduction to protein structure , 1991 .

[78]  S F Altschul,et al.  Statistical methods and insights for protein and DNA sequences. , 1991, Annual review of biophysics and biophysical chemistry.

[79]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[80]  O. Kandler,et al.  Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[81]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[82]  S. Osawa,et al.  Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[83]  Masasuke Yoshida,et al.  Evolution of the vacuolar H+-ATPase: implications for the origin of eukaryotes. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[84]  J. Gogarten,et al.  Molecular Evolution of H+-ATPases. I. Methanococcus and Sulfolobus are Monophyletic with Respect to Eukaryotes and Eubacteria , 1989, Zeitschrift fur Naturforschung. C, Journal of biosciences.

[85]  R A Garrett,et al.  Archaebacterial DNA-dependent RNA polymerases testify to the evolution of the eukaryotic nuclear genome. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[86]  R. Garrett,et al.  Sequence, organization, transcription and evolution of RNA polymerase subunit genes from the archaebacterial extreme halophiles Halobacterium halobium and Halococcus morrhuae. , 1989, Journal of molecular biology.

[87]  R. Garrett,et al.  The phylogenetic relations of DNA-dependent RNA polymerases of archaebacteria, eukaryotes, and eubacteria. , 1989, Canadian journal of microbiology.

[88]  N. Sans,et al.  Ornithine cyclodeaminase from Ti plasmid C58: DNA sequence, enzyme properties and regulation of activity by arginine. , 1988, European journal of biochemistry.

[89]  Gary J. Olsen,et al.  Ribosomal RNA phylogeny and the primary lines of evolutionary descent , 1986, Cell.

[90]  C. Woese,et al.  Are archaebacteria merely derived ‘prokaryotes’? , 1981, Nature.

[91]  E. Adams,et al.  Metabolism of proline and the hydroxyprolines. , 1980, Annual review of biochemistry.