Evaluation of viral genome assembly and diversity estimation in deep metagenomes

BackgroundViruses have unique properties, small genome and regions of high similarity, whose effects on metagenomic assemblies have not been characterized so far. This study uses diverse in silico simulated viromes to evaluate how extensively genomes can be assembled using different sequencing platforms and assemblers. Further, it investigates the suitability of different methods to estimate viral diversity in metagenomes.ResultsWe created in silico metagenomes mimicking various platforms at different sequencing depths. The CLC assembler revealed subpar compared to IDBA_UD and CAMERA , which are metagenomic-specific. Up to a saturation point, Illumina platforms proved more capable of reconstructing large portions of viral genomes compared to 454. Read length was an important factor for limiting chimericity, while scaffolding marginally improved contig length and accuracy. The genome length of the various viruses in the metagenomes did not significantly affect genome reconstruction, but the co-existence of highly similar genomes was detrimental. When evaluating diversity estimation tools, we found that PHACCS results were more accurate than those from CatchAll and clustering, which were both orders of magnitude above expected.ConclusionsAssemblers designed specifically for the analysis of metagenomes should be used to facilitate the creation of high-quality long contigs. Despite the high coverage possible, scientists should not expect to always obtain complete genomes, because their reconstruction may be hindered by co-existing species bearing highly similar genomic regions. Further development of metagenomics-oriented assemblers may help bypass these limitations in future studies. Meanwhile, the lack of fully reconstructed communities keeps methods to estimate viral diversity relevant. While none of the three methods tested had absolute precision, only PHACCS was deemed suitable for comparative studies.

[1]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[2]  E. Koonin,et al.  A novel family of P-loop NTPases with an unusual phyletic distribution and transmembrane segments inserted within the NTPase domain , 2004, Genome Biology.

[3]  A. Danchin,et al.  Bmc Genomics , 2004 .

[4]  Peter Salamon,et al.  PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information , 2005, BMC Bioinformatics.

[5]  Joaquín Dopazo,et al.  PupasView: a visual tool for selecting suitable SNPs, with putative pathological effect in genes, for genotyping purposes , 2005, Nucleic Acids Res..

[6]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[7]  K. Konstantinidis,et al.  The bacterial species definition in the genomic era , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[8]  Florent E. Angly,et al.  The Marine Viromes of Four Oceanic Regions , 2006, PLoS biology.

[9]  Florent E. Angly,et al.  Power law rank-abundance models for marine phage communities. , 2007, FEMS microbiology letters.

[10]  S. Kravitz,et al.  CAMERA: A Community Resource for Metagenomics , 2007, PLoS biology.

[11]  Anne-Béatrice Dufour,et al.  The ade4 Package: Implementing the Duality Diagram for Ecologists , 2007 .

[12]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[13]  A. Salamov,et al.  Use of simulated data sets to evaluate the fidelity of metagenomic processing methods , 2007, Nature Methods.

[14]  Daniel H. Huson,et al.  MetaSim—A Sequencing Simulator for Genomics and Metagenomics , 2008, PloS one.

[15]  Antonio Quesada,et al.  High Diversity of the Viral Community from an Antarctic Lake , 2009, Science.

[16]  Volker Roth,et al.  Deep Sequencing of a Genetically Heterogeneous Sample: Local Haplotype Reconstruction and Read Error Correction , 2009, RECOMB.

[17]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[18]  Huzefa Rangwala,et al.  Evaluation of short read metagenomic assembly , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[19]  Ion I. Mandoiu,et al.  Inferring viral quasispecies spectra from 454 pyrosequencing reads , 2011, BMC Bioinformatics.

[20]  Florent E. Angly,et al.  Next Generation Sequence Assembly with AMOS , 2011, Current protocols in bioinformatics.

[21]  Daniel H. Huson,et al.  48. MetaSim: A Sequencing Simulator for Genomics and Metagenomics , 2011 .

[22]  Robert A. Edwards,et al.  Quality control and preprocessing of metagenomic datasets , 2011, Bioinform..

[23]  T. Thomas,et al.  GemSIM: general, error-model based simulator of next-generation sequencing data , 2012, BMC Genomics.

[24]  T. Glenn Field guide to next‐generation DNA sequencers , 2011, Molecular ecology resources.

[25]  A. Moya,et al.  Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data , 2011, PloS one.

[26]  François Enault,et al.  Metavir: a web server dedicated to virome analysis , 2011, Bioinform..

[27]  Juliane C. Dohm,et al.  Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems , 2011, Genome Biology.

[28]  Florent E. Angly,et al.  Grinder: a versatile amplicon and shotgun sequence simulator , 2012, Nucleic acids research.

[29]  Zhen Yue,et al.  pIRS: Profile-based Illumina pair-end reads simulator , 2012, Bioinform..

[30]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[31]  N. Kyrpides,et al.  Direct Comparisons of Illumina vs. Roche 454 Sequencing Technologies on the Same Microbial Community DNA Sample , 2012, PloS one.

[32]  N. Kyrpides,et al.  Individual genome assembly from complex community short-read metagenomic datasets , 2011, The ISME Journal.

[33]  James A. Foster,et al.  Estimating population diversity with CatchAll , 2012, Bioinform..

[34]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[35]  Alison S. Waller,et al.  Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data , 2012, PloS one.

[36]  M. Pignatelli,et al.  Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut , 2014, BMC Genomics.

[37]  John Bunge,et al.  Estimation of viral richness from shotgun metagenomes using a frequency count approach , 2013, Microbiome.

[38]  M. Pop,et al.  Sequence assembly demystified , 2013, Nature Reviews Genetics.

[39]  S. Hallam,et al.  Sequencing platform and library preparation choices impact viral metagenomes , 2013, BMC Genomics.