Trends between gene content and genome size in prokaryotic species with larger genomes.

Although the evolution process and ecological benefits of symbiotic species with small genomes are well understood, these issues remain poorly elucidated for free-living species with large genomes. We have compared 115 completed prokaryotic genomes by using the Clusters of Orthologous Groups database to determine whether there are changes with genome size in the proportion of the genome attributable to particular cellular processes, because this may reflect both cellular and ecological strategies associated with genome expansion. We found that large genomes are disproportionately enriched in regulation and secondary metabolism genes and depleted in protein translation, DNA replication, cell division, and nucleotide metabolism genes compared to medium- and small-sized genomes. Furthermore, large genomes do not accumulate noncoding DNA or hypothetical ORFs, because the portion of the genome devoted to these functions remained constant with genome size. Traits other than genome size or strain-specific processes are reflected by the dispersion around the mean for cell functions that showed no correlation with genome size. For example, Archaea had significantly more genes in energy production, coenzyme metabolism, and the poorly characterized category, and fewer in cell membrane biogenesis and carbohydrate metabolism than Bacteria. The trends we noted with genome size by using Clusters of Orthologous Groups were confirmed by our independent analysis with The Institute for Genomic Research's Comprehensive Microbial Resource and Kyoto Encyclopedia of Genes and Genomes' Orthology annotation databases. These trends suggest that larger genome-sized species may dominate in environments where resources are scarce but diverse and where there is little penalty for slow growth, such as soil.

[1]  H. König Archaeobacterial cell envelopes , 1988 .

[2]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[3]  AC Tose Cell , 1993, Cell.

[4]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[5]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[6]  Hyo Jin Lee,et al.  Incubation time and media requirements of culturable bacteria from different phylogenetic groups , 1997 .

[7]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[8]  M. Gray Rickettsia, typhus and the mitochondrial connection , 1998, Nature.

[9]  C. Kurland,et al.  Reductive evolution of resident genomes. , 1998, Trends in microbiology.

[10]  J A Eisen,et al.  Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. , 1998, Genome research.

[11]  K. Minamisawa,et al.  Slow-growing and oligotrophic soil bacteria phylogenetically close to Bradyrhizobium japonicum , 1998 .

[12]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[13]  C. Fraser,et al.  Status of genome projects for nonpathogenic bacteria and archaea , 2000, Nature Biotechnology.

[14]  T. Schmidt,et al.  rRNA Operon Copy Number Reflects Ecological Strategies of Bacteria , 2000, Applied and Environmental Microbiology.

[15]  S. Lory,et al.  Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen , 2000, Nature.

[16]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[17]  E V Koonin,et al.  Lineage-specific gene expansions in bacterial and archaeal genomes. , 2001, Genome research.

[18]  N. Moran,et al.  Deletional bias and the evolution of bacterial genomes. , 2001, Trends in genetics : TIG.

[19]  P. Babbitt,et al.  Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies. , 2001, Annual review of biochemistry.

[20]  N. Moran,et al.  Microbial Minimalism Genome Reduction in Bacterial Pathogens , 2002, Cell.

[21]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[22]  S. Andersson,et al.  Genome deterioration: loss of repeated sequences and accumulation of junk DNA , 2002, Genetica.

[23]  Michael Y. Galperin,et al.  Functional genomics and enzyme evolution , 2004, Genetica.

[24]  BMC Bioinformatics , 2005 .