Empirical validation of viral quasispecies assembly algorithms: state-of-the-art and challenges

Next generation sequencing (NGS) is superseding Sanger technology for analysing intra-host viral populations, in terms of genome length and resolution. We introduce two new empirical validation data sets and test the available viral population assembly software. Two intra-host viral population ‘quasispecies’ samples (type-1 human immunodeficiency and hepatitis C virus) were Sanger-sequenced, and plasmid clone mixtures at controlled proportions were shotgun-sequenced using Roche's 454 sequencing platform. The performance of different assemblers was compared in terms of phylogenetic clustering and recombination with the Sanger clones. Phylogenetic clustering showed that all assemblers captured a proportion of the most divergent lineages, but none were able to provide a high precision/recall tradeoff. Estimated variant frequencies mildly correlated with the original. Given the limitations of currently available algorithms identified by our empirical validation, the development and exploitation of additional data sets is needed, in order to establish an efficient framework for viral population reconstruction using NGS.

[1]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[2]  M. Ronaghi,et al.  Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. , 2007, Genome research.

[3]  Pavel Skums,et al.  Efficient error correction for next-generation sequencing of viral amplicons , 2012, BMC Bioinformatics.

[4]  Niko Beerenwinkel,et al.  Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies , 2010, Nucleic acids research.

[5]  Sorin Istrail,et al.  QColors: An algorithm for conservative viral quasispecies reconstruction from short and non-contiguous next generation sequencing reads , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[6]  Edward C Holmes,et al.  The RNA virus quasispecies: fact or fiction? , 2010, Journal of molecular biology.

[7]  Nazle M. Veras,et al.  Unexpected Maintenance of Hepatitis C Viral Diversity following Liver Transplantation , 2012, Journal of Virology.

[8]  Giovanni Ulivi,et al.  Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing , 2011, BMC Bioinformatics.

[9]  K. Metzner,et al.  Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data , 2012, Front. Microbio..

[10]  Emese Meglécz,et al.  Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing , 2011, BMC Genomics.

[11]  Niko Beerenwinkel,et al.  Ultra-deep sequencing for the analysis of viral populations. , 2011, Current opinion in virology.

[12]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[13]  B. Dunn,et al.  Genetic determinants in HIV-1 Gag and Env V3 are related to viral response to combination antiretroviral therapy with a protease inhibitor , 2009, AIDS.

[14]  Piotr Berman,et al.  HCV Quasispecies Assembly Using Network Flows , 2008, ISBRA.

[15]  Michael C. Zody,et al.  Highly Sensitive and Specific Detection of Rare Variants in Mixed Viral Populations from Massively Parallel Sequence Data , 2012, PLoS Comput. Biol..

[16]  W. Grody,et al.  Keeping up with the next generation: massively parallel sequencing in clinical diagnostics. , 2008, The Journal of molecular diagnostics : JMD.

[17]  Wei Hou,et al.  High-resolution deep sequencing reveals biodiversity, population structure, and persistence of HIV-1 quasispecies within host ecosystems , 2012, Retrovirology.

[18]  Volker Roth,et al.  Deep Sequencing of a Genetically Heterogeneous Sample: Local Haplotype Reconstruction and Read Error Correction , 2009, RECOMB.

[19]  D. Bryant,et al.  A Simple and Robust Statistical Test for Detecting the Presence of Recombination , 2006, Genetics.

[20]  C. Lopez-Galíndez,et al.  Unfinished stories on viral quasispecies and Darwinian views of evolution. , 2010, Journal of molecular biology.

[21]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[22]  Jean-Paul Comet,et al.  Sequence Alignment: An Approximation Law for the Z-value with Applications to Databank Scanning , 2001, Comput. Chem..

[23]  Volker Roth,et al.  Probabilistic Inference of Viral Quasispecies Subject to Recombination , 2012, RECOMB.

[24]  Ion I. Mandoiu,et al.  Inferring viral quasispecies spectra from 454 pyrosequencing reads , 2011, BMC Bioinformatics.

[25]  Giovanni Chillemi,et al.  Massively parallel pyrosequencing highlights minority variants in the HIV-1 env quasispecies deriving from lymphomonocyte sub-populations , 2009, Retrovirology.

[26]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[27]  Sergei L. Kosakovsky Pond,et al.  Phylogenetic analysis of population-based and deep sequencing data to identify coevolving sites in the nef gene of HIV-1. , 2010, Molecular biology and evolution.

[28]  Mattia C. F. Prosperi,et al.  QuRe: software for viral quasispecies reconstruction from next-generation sequencing data , 2012, Bioinform..

[29]  Huldrych F. Günthard,et al.  Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection , 2012, PLoS pathogens.

[30]  A. Tretyn,et al.  Sequencing technologies and genome sequencing , 2011, Journal of Applied Genetics.

[31]  S. Kingsmore,et al.  Deep Sequencing of Patient Genomes for Disease Diagnosis: When Will It Become Routine? , 2011, Science Translational Medicine.

[32]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[33]  John Archer,et al.  Detection of low-frequency pretherapy chemokine (CXC motif) receptor 4 (CXCR4)-using HIV-1 with ultra-deep pyrosequencing , 2009, AIDS.

[34]  Masato Tashiro,et al.  Characterization of Quasispecies of Pandemic 2009 Influenza A Virus (A/H1N1/2009) by De Novo Sequencing Using a Next-Generation DNA Sequencer , 2010, PloS one.

[35]  David L. Robertson,et al.  The Evolutionary Analysis of Emerging Low Frequency HIV-1 CXCR4 Using Variants through Time—An Ultra-Deep Approach , 2010, PLoS Comput. Biol..

[36]  Nicholas Eriksson,et al.  ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data , 2011, BMC Bioinformatics.

[37]  Lior Pachter,et al.  Viral Population Estimation Using Pyrosequencing , 2007, PLoS Comput. Biol..

[38]  Ion I. Mandoiu,et al.  Viral quasispecies reconstruction from amplicon 454 pyrosequencing reads , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[39]  David L. Robertson,et al.  Analysis of high-depth sequence data for studying viral diversity: a comparison of next generation sequencing platforms using Segminator II , 2012, BMC Bioinformatics.

[40]  You-Qiang Song,et al.  Evaluation of next-generation sequencing software in mapping and assembly , 2011, Journal of Human Genetics.

[41]  Volker Roth,et al.  Probabilistic Inference of Viral Quasispecies Subject to Recombination , 2013, J. Comput. Biol..

[42]  Nebojsa Jojic,et al.  Population Sequencing Using Short Reads: HIV as a Case Study , 2008, Pacific Symposium on Biocomputing.