The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes

Metagenomic studies characterize both the composition and diversity of uncultured viral and microbial communities. BLAST-based comparisons have typically been used for such analyses; however, sampling biases, high percentages of unknown sequences, and the use of arbitrary thresholds to find significant similarities can decrease the accuracy and validity of estimates. Here, we present Genome relative Abundance and Average Size (GAAS), a complete software package that provides improved estimates of community composition and average genome length for metagenomes in both textual and graphical formats. GAAS implements a novel methodology to control for sampling bias via length normalization, to adjust for multiple BLAST similarities by similarity weighting, and to select significant similarities using relative alignment lengths. In benchmark tests, the GAAS method was robust to both high percentages of unknown sequences and to variations in metagenomic sequence read lengths. Re-analysis of the Sargasso Sea virome using GAAS indicated that standard methodologies for metagenomic analysis may dramatically underestimate the abundance and importance of organisms with small genomes in environmental systems. Using GAAS, we conducted a meta-analysis of microbial and viral average genome lengths in over 150 metagenomes from four biomes to determine whether genome lengths vary consistently between and within biomes, and between microbial and viral communities from the same environment. Significant differences between biomes and within aquatic sub-biomes (oceans, hypersaline systems, freshwater, and microbialites) suggested that average genome length is a fundamental property of environments driven by factors at the sub-biome level. The behavior of paired viral and microbial metagenomes from the same environment indicated that microbial and viral average genome sizes are independent of each other, but indicative of community responses to stressors and environmental conditions.

[1]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[2]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[3]  J. Ravel,et al.  Population Dynamics of Chesapeake Bay Virioplankton: Total-Community Analysis by Pulsed-Field Gel Electrophoresis , 1999, Applied and Environmental Microbiology.

[4]  F. Azam,et al.  Genome size distributions indicate variability and similarities among marine viral assemblages from diverse environments , 2000 .

[5]  B. Díez,et al.  Pulsed-field gel electrophoresis analysis of virus assemblages present in a hypersaline environment. , 2000, International microbiology : the official journal of the Spanish Society for Microbiology.

[6]  L. Koski,et al.  The Closest BLAST Hit Is Often Not the Nearest Neighbor , 2001, Journal of Molecular Evolution.

[7]  B. Swaminathan,et al.  PulseNet standardized protocol for subtyping Listeria monocytogenes by macrorestriction and pulsed-field gel electrophoresis. , 2001, International journal of food microbiology.

[8]  R. Edwards,et al.  The Phage Proteomic Tree: a Genome-Based Taxonomy for Phage , 2002, Journal of bacteriology.

[9]  Sen-Lin Tang,et al.  Haloarchaeal viruses: how diverse are they? , 2003, Research in microbiology.

[10]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[11]  R. Sandaa,et al.  Virioplankton community structure along a salinity gradient in a solar saltern , 2003, Extremophiles.

[12]  M. Weinbauer,et al.  Are viruses driving microbial diversification and diversity? , 2003, Environmental microbiology.

[13]  J. Parkhill,et al.  Comparative genomic structure of prokaryotes. , 2004, Annual review of genetics.

[14]  K. Konstantinidis,et al.  Trends between gene content and genome size in prokaryotic species with larger genomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Daniel W. A. Buchan,et al.  Evolution of protein superfamilies and bacterial genome size. , 2004, Journal of molecular biology.

[16]  Jacques Ravel,et al.  Visualization of comparative genomic analyses by BLAST score ratio , 2005, BMC Bioinformatics.

[17]  Peter Salamon,et al.  PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information , 2005, BMC Bioinformatics.

[18]  Florent E. Angly,et al.  The Marine Viromes of Four Oceanic Regions , 2006, PLoS biology.

[19]  I. Hewson,et al.  Viral and bacterial assemblage covariance in oligotrophic waters of the West Florida Shelf (Gulf of Mexico) , 2006, Journal of the Marine Biological Association of the United Kingdom.

[20]  Michael K. Coleman,et al.  Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae. , 2006, Journal of proteome research.

[21]  M. Washburn,et al.  Quantitative proteomic analysis of distinct mammalian Mediator complexes using normalized spectral abundance factors , 2006, Proceedings of the National Academy of Sciences.

[22]  P. Bork,et al.  Prediction of effective genome size in metagenomic samples , 2007, Genome Biology.

[23]  Michael K. Coleman,et al.  Analyzing chromatin remodeling complexes using shotgun proteomics and normalized spectral abundance factors. , 2006, Methods.

[24]  P. Bork,et al.  Get the most out of your metagenome: computational analysis of environmental sequence data. , 2007, Current opinion in microbiology.

[25]  S. Kravitz,et al.  CAMERA: A Community Resource for Metagenomics , 2007, PLoS biology.

[26]  Peer Bork,et al.  Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation , 2007, Bioinform..

[27]  K. Holmfeldt,et al.  Large Variabilities in Host Strain Susceptibility and Phage Host Range Govern Interactions between Lytic Marine Phages and Their Flavobacterium Hosts , 2007, Applied and Environmental Microbiology.

[28]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[29]  Florent E. Angly,et al.  Microbial Ecology of Four Coral Atolls in the Northern Line Islands , 2008, PloS one.

[30]  P. DasSarma,et al.  On the origin of prokaryotic "species": the taxonomy of halophilic Archaea , 2008, Saline systems.

[31]  Florent E. Angly,et al.  Metagenomic analysis indicates that stressors induce production of herpes-like viruses in the coral Porites compressa , 2008, Proceedings of the National Academy of Sciences.

[32]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[33]  J. Tamames,et al.  Metagenomics reveals our incomplete knowledge of global diversity , 2008, Bioinform..

[34]  P. Hugenholtz,et al.  Why the ‘ meta ’ in metagenomics ? , 2022 .

[35]  Jaysheel D. Bhavsar,et al.  Metagenomics: Read Length Matters , 2008, Applied and Environmental Microbiology.

[36]  Florent E. Angly,et al.  Biodiversity and biogeography of phages in modern stromatolites and thrombolites , 2008, Nature.

[37]  R. Sandaa Burden or benefit? Virus-host interactions in the marine environment. , 2008, Research in microbiology.

[38]  Rick L. Stevens,et al.  Functional metagenomic profiling of nine biomes , 2008, Nature.

[39]  M. Breitbart,et al.  Diverse circovirus-like genome architectures revealed by environmental metagenomics. , 2009, The Journal of general virology.

[40]  D. Willner,et al.  Metagenomic signatures of 86 microbial and viral metagenomes. , 2009, Environmental microbiology.

[41]  Matthew Z. DeMaere,et al.  The genomic basis of trophic strategy in marine bacteria , 2009, Proceedings of the National Academy of Sciences.

[42]  Florent E. Angly,et al.  Metagenomic analysis of stressed coral holobionts. , 2009, Environmental microbiology.