The evolutionary history of protein fold families and proteomes confirms that the archaeal ancestor is more ancient than the ancestors of other superkingdoms

BackgroundThe entire evolutionary history of life can be studied using myriad sequences generated by genomic research. This includes the appearance of the first cells and of superkingdoms Archaea, Bacteria, and Eukarya. However, the use of molecular sequence information for deep phylogenetic analyses is limited by mutational saturation, differential evolutionary rates, lack of sequence site independence, and other biological and technical constraints. In contrast, protein structures are evolutionary modules that are highly conserved and diverse enough to enable deep historical exploration.ResultsHere we build phylogenies that describe the evolution of proteins and proteomes. These phylogenetic trees are derived from a genomic census of protein domains defined at the fold family (FF) level of structural classification. Phylogenomic trees of FF structures were reconstructed from genomic abundance levels of 2,397 FFs in 420 proteomes of free-living organisms. These trees defined timelines of domain appearance, with time spanning from the origin of proteins to the present. Timelines are divided into five different evolutionary phases according to patterns of sharing of FFs among superkingdoms: (1) a primordial protein world, (2) reductive evolution and the rise of Archaea, (3) the rise of Bacteria from the common ancestor of Bacteria and Eukarya and early development of the three superkingdoms, (4) the rise of Eukarya and widespread organismal diversification, and (5) eukaryal diversification. The relative ancestry of the FFs shows that reductive evolution by domain loss is dominant in the first three phases and is responsible for both the diversification of life from a universal cellular ancestor and the appearance of superkingdoms. On the other hand, domain gains are predominant in the last two phases and are responsible for organismal diversification, especially in Bacteria and Eukarya.ConclusionsThe evolution of functions that are associated with corresponding FFs along the timeline reveals that primordial metabolic domains evolved earlier than informational domains involved in translation and transcription, supporting the metabolism-first hypothesis rather than the RNA world scenario. In addition, phylogenomic trees of proteomes reconstructed from FFs appearing in each of the five phases of the protein world show that trees reconstructed from ancient domain structures were consistently rooted in archaeal lineages, supporting the proposal that the archaeal ancestor is more ancient than the ancestors of other superkingdoms.

[1]  John G. Lundberg,et al.  Wagner Networks and Ancestors , 1972 .

[2]  Gustavo Caetano-Anollés,et al.  The ancient history of the structure of ribonuclease P and the early origins of Archaea , 2010, BMC Bioinformatics.

[3]  R. Doolittle,et al.  Evolutionary aspects of whole-genome biology. , 2005, Current opinion in structural biology.

[4]  Andrew Meade,et al.  The slow road to the eukaryotic genome. , 2006, BioEssays : news and reviews in molecular, cellular and developmental biology.

[5]  G. Caetano-Anollés,et al.  Global phylogeny determined by the combination of protein domains in proteomes. , 2006, Molecular biology and evolution.

[6]  H. Philippe,et al.  How good are deep phylogenetic trees? , 1998, Current opinion in genetics & development.

[7]  M. Kimura Evolutionary Rate at the Molecular Level , 1968, Nature.

[8]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[9]  J. Wong,et al.  Transfer RNA paralogs: evidence for genetic code-amino acid biosynthesis coevolution and an archaeal root of life. , 2003, Gene.

[10]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[11]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[12]  C. Blank,et al.  Not so old Archaea – the antiquity of biogeochemical processes in the archaeal domain of life , 2009, Geobiology.

[13]  J. Huelsenbeck,et al.  Signal, noise, and reliability in molecular phylogenetic analyses. , 1992, The Journal of heredity.

[14]  B. Snel,et al.  Genome phylogeny based on gene content , 1999, Nature Genetics.

[15]  C. Kurland The RNA dreamtime , 2010, BioEssays : news and reviews in molecular, cellular and developmental biology.

[16]  Daniel H. Huson,et al.  Dendroscope: An interactive viewer for large phylogenetic trees , 2007, BMC Bioinformatics.

[17]  N. Pace Mapping the Tree of Life: Progress and Prospects , 2009, Microbiology and Molecular Biology Reviews.

[18]  G. Caetano-Anollés,et al.  The Evolutionary History of the Structure of 5S Ribosomal RNA , 2009, Journal of Molecular Evolution.

[19]  C. Kurland,et al.  The origins of modern proteomes. , 2007, Biochimie.

[20]  Oliver Eulenstein,et al.  Obtaining maximal concatenated phylogenetic data sets from large sequence databases. , 2003, Molecular biology and evolution.

[21]  Gustavo Caetano-Anollés,et al.  Universal Sharing Patterns in Proteomes and Evolution of Protein Fold Architecture and Life , 2005, Journal of Molecular Evolution.

[22]  H Philippe,et al.  Where is the root of the universal tree of life? , 1999, BioEssays : news and reviews in molecular, cellular and developmental biology.

[23]  J. Farris,et al.  Quantitative Phyletics and the Evolution of Anurans , 1969 .

[24]  M. Steel,et al.  A genome phylogeny for mitochondria among alpha-proteobacteria and a predominantly eubacterial ancestry of yeast nuclear genes. , 2004, Molecular biology and evolution.

[25]  C. Woese The universal ancestor. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Gustavo Caetano-Anollés,et al.  The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. , 2009, Structure.

[27]  Gustavo Caetano-Anollés,et al.  A universal molecular clock of protein folds and its power in tracing the early history of aerobic metabolism and planet oxygenation. , 2011, Molecular biology and evolution.

[28]  Cyrus Chothia,et al.  SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny , 2008, Nucleic Acids Res..

[29]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[30]  C. Darwin The Origin of Species by Means of Natural Selection, Or, The Preservation of Favoured Races in the Struggle for Life , 1859 .

[31]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2007: families and functions , 2006, Nucleic Acids Res..

[32]  Gustavo Caetano-Anollés,et al.  Evolutionary Patterns in the Sequence and Structure of Transfer RNA: Early Origins of Archaea and Viruses , 2008, PLoS Comput. Biol..

[33]  J. Wong,et al.  Congruence of evidence for a Methanopyrus-proximal root of life based on transfer RNA and aminoacyl-tRNA synthetase genes. , 2005, Gene.

[34]  P. Bork,et al.  Non-orthologous gene displacement. , 1996, Trends in genetics : TIG.

[35]  Gustavo Caetano-Anollés,et al.  Proteome Evolution and the Metabolic Origins of Translation and Cellular Life , 2010, Journal of Molecular Evolution.

[36]  D. Caetano-Anollés,et al.  The origin, evolution and structure of the protein world. , 2009, The Biochemical journal.

[37]  Cyrus Chothia,et al.  Genomic and structural aspects of protein evolution. , 2009, The Biochemical journal.

[38]  Gustavo Caetano-Anollés,et al.  The proteomic complexity and rise of the primordial ancestor of diversified life , 2011, BMC Evolutionary Biology.

[39]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[40]  Gustavo Caetano-Anollés,et al.  An evolutionarily structured universe of protein architecture. , 2003, Genome research.

[41]  G. Caetano-Anollés,et al.  Emergence and evolution of modern molecular functions inferred from phylogenomic analysis of ontological data. , 2010, Molecular biology and evolution.

[42]  H. Xue,et al.  Polyphasic evidence delineating the root of life and roots of biological domains. , 2007, Gene.

[43]  G. Caetano-Anollés,et al.  The Origin and Evolution of tRNA Inferred from Phylogenetic Analysis of Structure , 2007, Journal of Molecular Evolution.

[44]  M. Di Giulio The tree of life might be rooted in the branch leading to Nanoarchaeota. , 2007, Gene.

[45]  G. Caetano-Anollés,et al.  An approach of orthology detection from homologous sequences under minimum evolution , 2008, Nucleic acids research.

[46]  J. Lake,et al.  Horizontal gene transfer among genomes: the complexity hypothesis. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Gustavo Caetano-Anollés,et al.  The origin of modern metabolic networks inferred from phylogenomic analysis of protein architecture , 2007, Proceedings of the National Academy of Sciences.

[48]  Cyrus Chothia,et al.  Protein Family Expansions and Biological Complexity , 2006, PLoS Comput. Biol..

[49]  E. Sonnhammer,et al.  Domain tree-based analysis of protein architecture evolution. , 2008, Molecular biology and evolution.

[50]  Julian Gough,et al.  Convergent evolution of domain architectures (is rare) , 2005, Bioinform..

[51]  Cyrus Chothia,et al.  SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments , 2002, Nucleic Acids Res..

[52]  Gustavo Caetano-Anollés,et al.  Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world. , 2007, Genome research.

[53]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..