Prediction of effective genome size in metagenomic samples

We introduce a novel computational approach to predict effective genome size (EGS; a measure that includes multiple plasmid copies, inserted sequences, and associated phages and viruses) from short sequencing reads of environmental genomics (or metagenomics) projects. We observe considerable EGS differences between environments and link this with ecologic complexity as well as species composition (for instance, the presence of eukaryotes). For example, we estimate EGS in a complex, organism-dense farm soil sample at about 6.3 megabases (Mb) whereas that of the bacteria therein is only 4.7 Mb; for bacteria in a nutrient-poor, organism-sparse ocean surface water sample, EGS is as low as 1.6 Mb. The method also permits evaluation of completion status and assembly bias in single-genome sequencing projects.

[1]  L. Bakken,et al.  DNA-content of soil bacteria of different cell size , 1989 .

[2]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[3]  V. Torsvik,et al.  Comparison of phenotypic diversity and DNA heterogeneity in a population of soil bacteria , 1990, Applied and environmental microbiology.

[4]  Lars R. Bakken,et al.  Soil bacterial DNA and biovolume profiles measured by flow-cytometry , 1993 .

[5]  J. Pratt,et al.  Use of fluorochromes for direct enumeration of total bacteria in environmental samples: past and present. , 1994, Microbiological reviews.

[6]  Å. Hagström,et al.  Total counts of marine bacteria include a large fraction of non-nucleoid-containing bacteria (ghosts) , 1995, Applied and environmental microbiology.

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  M. Weinbauer,et al.  Utility of Green Fluorescent Nucleic Acid Dyes and Aluminum Oxide Membrane Filters for Rapid Epifluorescence Enumeration of Soil and Sediment Bacteria , 1998, Applied and Environmental Microbiology.

[9]  H. Ochman,et al.  Distribution of chromosome length variation in natural isolates of Escherichia coli. , 1998, Molecular biology and evolution.

[10]  W. Hess,et al.  A small and compact genome in the marine cyanobacterium Prochlorococcus marinus CCMP 1375: lack of an intron in the gene for tRNA(Leu)(UAA) and a single copy of the rRNA operon. , 1999, FEMS microbiology letters.

[11]  N. Moran,et al.  Deletional bias and the evolution of bacterial genomes. , 2001, Trends in genetics : TIG.

[12]  B. Robertson,et al.  Determination of DNA Content of Aquatic Bacteria by Flow Cytometry , 2001, Applied and Environmental Microbiology.

[13]  L. Øvreås,et al.  Prokaryotic Diversity--Magnitude, Dynamics, and Controlling Factors , 2002, Science.

[14]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[15]  Manesh Shah,et al.  Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation , 2003, Nature.

[16]  C. Ouzounis,et al.  Transcription regulation and environmental adaptation in bacteria. , 2003, Trends in microbiology.

[17]  E. Nimwegen Scaling Laws in the Functional Content of Genomes , 2003, physics/0307001.

[18]  J. Parkhill,et al.  Comparative genomic structure of prokaryotes. , 2004, Annual review of genetics.

[19]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[20]  K. Konstantinidis,et al.  Trends between gene content and genome size in prokaryotic species with larger genomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Paul G Falkowski,et al.  Shotgun Sequencing in the Sea: A Blast from the Past? , 2004, Science.

[22]  J. Banfield,et al.  Community structure and metabolism through reconstruction of microbial genomes from the environment , 2004, Nature.

[23]  J. Bada,et al.  New Method for Estimating Bacterial Cell Abundances in Natural Samples by Use of Sublimation , 2004, Applied and Environmental Microbiology.

[24]  Daniel W. A. Buchan,et al.  Evolution of protein superfamilies and bacterial genome size. , 2004, Journal of molecular biology.

[25]  Jeroen Raes,et al.  Duplication and divergence: the evolution of new genes and old ideas. , 2004, Annual review of genetics.

[26]  Frédéric Partensky,et al.  Accelerated evolution associated with genome reduction in a free-living prokaryote , 2005, Genome Biology.

[27]  D. Gevers,et al.  Gene duplication and biased functional retention of paralogs in bacterial genomes. , 2004, Trends in microbiology.

[28]  Peter Salamon,et al.  PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information , 2005, BMC Bioinformatics.

[29]  Jo Handelsman,et al.  Metagenomics for studying unculturable microorganisms: cutting the Gordian knot , 2005, Genome Biology.

[30]  Edward M. Rubin,et al.  Metagenomics: DNA sequencing of environmental samples , 2005, Nature Reviews Genetics.

[31]  Edward F. DeLong,et al.  Microbial community genomics in the ocean , 2005, Nature Reviews Microbiology.

[32]  P. Bork,et al.  Environments shape the nucleotide composition of genomes , 2005, EMBO reports.

[33]  M. Noordewier,et al.  Genome Streamlining in a Cosmopolitan Oceanic Bacterium , 2005, Science.

[34]  Kumar Rajakumar,et al.  ArrayOme: a program for estimating the sizes of microarray-visualized bacterial genomes , 2005, Nucleic acids research.

[35]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[36]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.

[37]  J. Shendure,et al.  Materials and Methods Som Text Figs. S1 and S2 Tables S1 to S4 References Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome , 2022 .

[38]  Katherine H. Kang,et al.  Genome Sequence of the PCE-Dechlorinating Bacterium Dehalococcoides ethenogenes , 2005, Science.

[39]  Karin A Remington,et al.  Taking metagenomic studies in context. , 2005, Trends in microbiology.

[40]  V. Torsvik,et al.  Total bacterial diversity in soil and sediment communities—A review , 1996, Journal of Industrial Microbiology.

[41]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[42]  R. Daniel The metagenomics of soil , 2005, Nature Reviews Microbiology.

[43]  R. Gregory The evolution of the genome , 2005 .

[44]  B. Snel,et al.  Toward Automatic Reconstruction of a Highly Resolved Tree of Life , 2006, Science.

[45]  Christian von Mering,et al.  Comparative analysis of environmental sequences: potential and challenges , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[46]  B. Barnes,et al.  Genomics and Evolution , 2008 .

[47]  P. Sassone-Corsi,et al.  Computational Improvements Reveal Great Bacterial Diversity and High Metal Toxicity in Soil , 2022 .