Viral quasispecies inference from 454 pyrosequencing

BackgroundMany potentially life-threatening infectious viruses are highly mutable in nature. Characterizing the fittest variants within a quasispecies from infected patients is expected to allow unprecedented opportunities to investigate the relationship between quasispecies diversity and disease epidemiology. The advent of next-generation sequencing technologies has allowed the study of virus diversity with high-throughput sequencing, although these methods come with higher rates of errors which can artificially increase diversity.ResultsHere we introduce a novel computational approach that incorporates base quality scores from next-generation sequencers for reconstructing viral genome sequences that simultaneously infers the number of variants within a quasispecies that are present. Comparisons on simulated and clinical data on dengue virus suggest that the novel approach provides a more accurate inference of the underlying number of variants within the quasispecies, which is vital for clinical efforts in mapping the within-host viral diversity. Sequence alignments generated by our approach are also found to exhibit lower rates of error.ConclusionsThe ability to infer the viral quasispecies colony that is present within a human host provides the potential for a more accurate classification of the viral phenotype. Understanding the genomics of viruses will be relevant not just to studying how to control or even eradicate these viral infectious diseases, but also in learning about the innate protection in the human host against the viruses.

[1]  Volker Roth,et al.  Deep Sequencing of a Genetically Heterogeneous Sample: Local Haplotype Reconstruction and Read Error Correction , 2009, RECOMB.

[2]  Susan M. Huse,et al.  Accuracy and quality of massively parallel DNA pyrosequencing , 2007, Genome Biology.

[3]  Raul Andino,et al.  Quasispecies Theory and the Behavior of RNA Viruses , 2010, PLoS pathogens.

[4]  Mattia C. F. Prosperi,et al.  QuRe: software for viral quasispecies reconstruction from next-generation sequencing data , 2012, Bioinform..

[5]  Niko Beerenwinkel,et al.  Ultra-deep sequencing for the analysis of viral populations. , 2011, Current opinion in virology.

[6]  Nicholas Eriksson,et al.  ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data , 2011, BMC Bioinformatics.

[7]  S. Behura,et al.  Nucleotide substitutions in dengue virus serotypes from Asian and American countries: insights into intracodon recombination and purifying selection , 2013, BMC Microbiology.

[8]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[9]  Niko Beerenwinkel,et al.  Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies , 2010, Nucleic acids research.

[10]  Giovanni Ulivi,et al.  Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing , 2011, BMC Bioinformatics.

[11]  Ion I. Mandoiu,et al.  Inferring viral quasispecies spectra from 454 pyrosequencing reads , 2011, BMC Bioinformatics.

[12]  E. Domingo,et al.  Viral Quasispecies Evolution , 2012, Microbiology and Molecular Reviews.

[13]  Volker Roth,et al.  HIV-Haplotype Inference using a Constraint-based Dirichlet Process Mixture Model , 2010 .

[14]  Nebojsa Jojic,et al.  Population Sequencing Using Short Reads: HIV as a Case Study , 2008, Pacific Symposium on Biocomputing.

[15]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[16]  Lior Pachter,et al.  Viral Population Estimation Using Pyrosequencing , 2007, PLoS Comput. Biol..

[17]  M. Roossinck,et al.  Genetic Diversity in RNA Virus Quasispecies Is Controlled by Host-Virus Interactions , 2001, Journal of Virology.

[18]  Sorin Istrail,et al.  QColors: An algorithm for conservative viral quasispecies reconstruction from short and non-contiguous next generation sequencing reads , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[19]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[20]  Volker Roth,et al.  Probabilistic Inference of Viral Quasispecies Subject to Recombination , 2012, RECOMB.

[21]  Emese Meglécz,et al.  Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing , 2011, BMC Genomics.

[22]  Piotr Berman,et al.  HCV Quasispecies Assembly Using Network Flows , 2008, ISBRA.

[23]  BMC Bioinformatics , 2005 .