Viral Quasispecies Assembly via Maximal Clique Enumeration

Genetic variability of virus populations within individual hosts is a key determinant of pathogenesis, virulence, and treatment outcome. It is of clinical importance to identify and quantify the intra-host ensemble of viral haplotypes, called viral quasispecies. Ultra-deep next-generation sequencing NGS of mixed samples is currently the only efficient way to probe genetic diversity of virus populations in greater detail. Major challenges with this bulk sequencing approach are i to distinguish genetic diversity from sequencing errors, ii to assemble an unknown number of different, unknown, haplotype sequences over a genomic region larger than the average read length, iii to estimate their frequency distribution, and iv to detect structural variants, such as large insertions and deletions indels that are due to erroneous replication or alternative splicing. Even though NGS is currently introduced in clinical diagnostics, the de-facto standard procedure to assess the quasispecies structure is still single-nucleotide variant SNV calling. Viral phenotypes cannot be predicted solely from individual SNVs, as epistatic interactions are abundant in RNA viruses. Therefore, reconstruction of long-range viral haplotypes has the potential to be adopted, as data is already available.

[1]  Alexander Schliep,et al.  CLEVER: clique-enumerating variant finder , 2012, Bioinform..

[2]  Ion I. Mandoiu,et al.  Viral quasispecies reconstruction from amplicon 454 pyrosequencing reads , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[3]  Li Yin,et al.  Empirical validation of viral quasispecies assembly algorithms: state-of-the-art and challenges , 2013, Scientific Reports.

[4]  Niko Beerenwinkel,et al.  Read length versus Depth of Coverage for Viral Quasispecies Reconstruction , 2012, PloS one.

[5]  David Eppstein,et al.  Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time , 2010, Exact Complexity of NP-hard Problems.

[6]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[7]  Russell J. Davenport,et al.  Removing Noise From Pyrosequenced Amplicons , 2011, BMC Bioinformatics.

[8]  Volker Roth,et al.  HIV Haplotype Inference Using a Propagating Dirichlet Process Mixture Model , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[10]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[11]  Lior Pachter,et al.  Viral Population Estimation Using Pyrosequencing , 2007, PLoS Comput. Biol..

[12]  Frederic D. Bushman,et al.  Dynamic regulation of HIV-1 mRNA populations analyzed by single-molecule enrichment and long-read sequencing , 2012, Nucleic acids research.

[13]  Shawn T. O'Neil,et al.  Haplotype and minimum-chimerism consensus determination using short sequence data , 2012, BMC Genomics.

[14]  Christopher Quince,et al.  Benchmarking of viral haplotype reconstruction programmes: an overview of the capacities and limitations of currently available programmes , 2014, Briefings Bioinform..

[15]  Nuno A. Fonseca,et al.  Assemblathon 1: a competitive assessment of de novo short read assembly methods. , 2011, Genome research.

[16]  Rowena A. Bull,et al.  Sequential Bottlenecks Drive Viral Evolution in Early Acute Hepatitis C Virus Infection , 2011, PLoS pathogens.

[17]  A. Briones,et al.  The secrets of El Dorado viewed through a microbial perspective , 2012, Front. Microbio..

[18]  M. Schatz,et al.  Algorithms Gage: a Critical Evaluation of Genome Assemblies and Assembly Material Supplemental , 2008 .

[19]  Martin Beer,et al.  Sequencing approach to analyze the role of quasispecies for classical swine fever. , 2013, Virology.

[20]  Sorin Istrail,et al.  Haplotype assembly in polyploid genomes and identical by descent shared tracts , 2013, Bioinform..

[21]  Huldrych F. Günthard,et al.  Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection , 2012, PLoS pathogens.

[22]  Volker Roth,et al.  Probabilistic Inference of Viral Quasispecies Subject to Recombination , 2013, J. Comput. Biol..

[23]  Volker Roth,et al.  Deep Sequencing of a Genetically Heterogeneous Sample: Local Haplotype Reconstruction and Read Error Correction , 2009, RECOMB.

[24]  Michael C. Zody,et al.  Highly Sensitive and Specific Detection of Rare Variants in Mixed Viral Populations from Massively Parallel Sequence Data , 2012, PLoS Comput. Biol..

[25]  J. Moon,et al.  On cliques in graphs , 1965 .

[26]  Sebastian Bonhoeffer,et al.  A systems analysis of mutational effects in HIV-1 protease and reverse transcriptase , 2011, Nature Genetics.

[27]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[28]  M. Vignuzzi,et al.  Quasispecies diversity determines pathogenesis through cooperative interactions in a viral population , 2006, Nature.

[29]  Ion I. Mandoiu,et al.  Inferring viral quasispecies spectra from 454 pyrosequencing reads , 2011, BMC Bioinformatics.

[30]  E. Domingo,et al.  Viral Quasispecies Evolution , 2012, Microbiology and Molecular Reviews.

[31]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[32]  K. Metzner,et al.  Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data , 2012, Front. Microbio..

[33]  L. Jorde,et al.  Genetic variation, classification and 'race' , 2004, Nature Genetics.

[34]  Pavel Skums,et al.  Efficient error correction for next-generation sequencing of viral amplicons , 2012, BMC Bioinformatics.

[35]  Nicholas Eriksson,et al.  ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data , 2011, BMC Bioinformatics.

[36]  Sorin Istrail,et al.  QColors: An algorithm for conservative viral quasispecies reconstruction from short and non-contiguous next generation sequencing reads , 2011, BIBM Workshops.

[37]  Sorin Istrail,et al.  HapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data , 2012, J. Comput. Biol..

[38]  Vineet Bafna,et al.  HapCUT: an efficient and accurate algorithm for the haplotype assembly problem , 2008, ECCB.

[39]  David Eppstein,et al.  Listing All Maximal Cliques in Large Sparse Real-World Graphs , 2011, JEAL.

[40]  Alexander Schönhuth,et al.  Next Generation Cluster Editing , 2013, PeerJ Prepr..

[41]  Iman Hajirasouliha,et al.  MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels , 2013, Bioinform..

[42]  Mattia C. F. Prosperi,et al.  QuRe: software for viral quasispecies reconstruction from next-generation sequencing data , 2012, Bioinform..