Signature Genes as a Phylogenomic Tool

Gene content has been shown to contain a strong phylogenetic signal, yet its usage for phylogenetic questions is hampered by horizontal gene transfer and parallel gene loss and until now required completely sequenced genomes. Here, we introduce an approach that allows the phylogenetic signal in gene content to be applied to any set of sequences, using signature genes for phylogenetic classification. The hundreds of publicly available genomes allow us to identify signature genes at various taxonomic depths, and we show how the presence of signature genes in an unspecified sample can be used to characterize its taxonomic composition. We identify 8,362 signature genes specific for 112 prokaryotic taxa. We show that these signature genes can be used to address phylogenetic questions on the basis of gene content in cases where classic gene content or sequence analyses provide an ambiguous answer, such as for Nanoarchaeum equitans, and even in cases where complete genomes are not available, such as for metagenomics data. Cross-validation experiments leaving out up to 30% of the species show that ∼92% of the signature genes correctly place the species in a related clade. Analyses of metagenomics data sets with the signature gene approach are in good agreement with the previously reported species distributions based on phylogenetic analysis of marker genes. Summarizing, signature genes can complement traditional sequence-based methods in addressing taxonomic questions.

[1]  Broome,et al.  Literature cited , 1924, A Guide to the Carnivores of Central America.

[2]  Ying He,et al.  Signature, a web server for taxonomic characterization of sequence samples using signature genes , 2008, Nucleic Acids Res..

[3]  A. Halpern,et al.  The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific , 2007, PLoS biology.

[4]  Bas E. Dutilh,et al.  Assessment of phylogenomic and orthology approaches for phylogenetic inference , 2007, Bioinform..

[5]  Benjamin J. Raphael,et al.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families , 2007, PLoS biology.

[6]  S. Tringe,et al.  Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments , 2007, Science.

[7]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[8]  Radhey S. Gupta,et al.  Phylogenomic analysis of proteins that are distinctive of Archaea and its main subgroups and the origin of methanogenesis , 2007, BMC Genomics.

[9]  Radhey S. Gupta,et al.  Signature proteins that are distinctive characteristics of Actinobacteria and their subgroups , 2006, Antonie van Leeuwenhoek.

[10]  Dmitrij Frishman,et al.  Deciphering the evolution and metabolism of an anammox bacterium from a community genome , 2006, Nature.

[11]  B. Snel,et al.  Toward Automatic Reconstruction of a Highly Resolved Tree of Life , 2006, Science.

[12]  M. Di Giulio Nanoarchaeum equitans is a living fossil. , 2006, Journal of theoretical biology.

[13]  Emma Griffiths,et al.  BLAST screening of chlamydial genomes to identify signature proteins that are unique for the Chlamydiales, Chlamydiaceae, Chlamydophila and Chlamydia groups of species , 2006, BMC Genomics.

[14]  B. Snel,et al.  Genome trees and the nature of genome evolution. , 2005, Annual review of microbiology.

[15]  Radhey S. Gupta,et al.  Signature proteins that are distinctive of alpha proteobacteria , 2005, BMC Genomics.

[16]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[17]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[18]  J. Andersson,et al.  Lateral gene transfer in eukaryotes , 2005, Cellular and Molecular Life Sciences CMLS.

[19]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.

[20]  P. Forterre,et al.  Nanoarchaea: representatives of a novel archaeal phylum or a fast-evolving euryarchaeal lineage related to Thermococcales? , 2005, Genome Biology.

[21]  Robert L Charlebois,et al.  Chlamydia: 780.57 (sd = 1.81), range 778–784, n =7 Cyanobacteria: 820.50 (sd = 23.53), range 776–844, n =8 , 2022 .

[22]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[23]  J. Banfield,et al.  Community structure and metabolism through reconstruction of microbial genomes from the environment , 2004, Nature.

[24]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[25]  George E. Fox,et al.  Cyanobacterial signature genes , 2004, Photosynthesis Research.

[26]  Bas E. Dutilh,et al.  The Consistent Phylogenetic Signal in Genome Trees Revealed by Reducing the Impact of Noise , 2004, Journal of Molecular Evolution.

[27]  Dieter Söll,et al.  The genome of Nanoarchaeum equitans: Insights into early archaeal evolution and derived parasitism , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Harald Huber,et al.  A new phylum of Archaea represented by a nanosized hyperthermophilic symbiont , 2002, Nature.

[29]  L. Hood,et al.  Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence. , 2001, Genome research.

[30]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[31]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[32]  B. Dujon,et al.  The genomic tree as revealed from whole proteome comparisons. , 1999, Genome research.

[33]  H. Philippe,et al.  Archaea sister group of Bacteria? Indications from tree reconstruction artifacts in ancient phylogenies. , 1999, Molecular biology and evolution.

[34]  B. Snel,et al.  Genome phylogeny based on gene content , 1999, Nature Genetics.

[35]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[36]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[37]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.