Computational framework for next-generation sequencing of heterogeneous viral populations using combinatorial pooling

MOTIVATION Next-generation sequencing (NGS) allows for analyzing a large number of viral sequences from infected patients, providing an opportunity to implement large-scale molecular surveillance of viral diseases. However, despite improvements in technology, traditional protocols for NGS of large numbers of samples are still highly cost and labor intensive. One of the possible cost-effective alternatives is combinatorial pooling. Although a number of pooling strategies for consensus sequencing of DNA samples and detection of SNPs have been proposed, these strategies cannot be applied to sequencing of highly heterogeneous viral populations. RESULTS We developed a cost-effective and reliable protocol for sequencing of viral samples, that combines NGS using barcoding and combinatorial pooling and a computational framework including algorithms for optimal virus-specific pools design and deconvolution of individual samples from sequenced pools. Evaluation of the framework on experimental and simulated data for hepatitis C virus showed that it substantially reduces the sequencing costs and allows deconvolution of viral populations with a high accuracy. AVAILABILITY AND IMPLEMENTATION The source code and experimental data sets are available at http://alan.cs.gsu.edu/NGS/?q=content/pooling.

[1]  David S. Campo,et al.  Temporal Variations in the Hepatitis C Virus Intrahost Population during Chronic Infection , 2011, Journal of Virology.

[2]  Snehit Prabhu,et al.  Overlapping Pools for High Throughput Targeted Resequencing , 2009, RECOMB.

[3]  Pavel Skums,et al.  Evaluation of viral heterogeneity using next-generation sequencing, end-point limiting-dilution and mass spectrometry , 2012, Silico Biol..

[4]  Francisco Rodríguez-Frías,et al.  Ultra-Deep Pyrosequencing (UDPS) Data Treatment to Study Amplicon HCV Minor Variants , 2013, PloS one.

[5]  B. Palmer,et al.  Insertion and recombination events at hypervariable region 1 over 9.6 years of hepatitis C virus chronic infection. , 2012, The Journal of general virology.

[6]  Weili Wu,et al.  On error-tolerant DNA screening , 2006, Discret. Appl. Math..

[7]  Adrian W. Briggs,et al.  Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA , 2009, Nucleic acids research.

[8]  N. Shental,et al.  Identification of rare alleles and their carriers using compressed se(que)nsing , 2011, Nucleic Acids Research.

[9]  Eran Halperin,et al.  Genotyping common and rare variation using overlapping pool sequencing , 2011, BMC Bioinformatics.

[10]  Pavel Skums,et al.  Numerical detection, measuring and analysis of differential interferon resistance for individual HCV intra-host variants and its influence on the therapy response , 2012, Silico Biol..

[11]  Yaniv Erlich,et al.  Weighted pooling—practical and cost-effective techniques for pooled high-throughput sequencing , 2012, Bioinform..

[12]  Shahar Alon,et al.  Barcoding bias in high-throughput multiplex sequencing of miRNA. , 2011, Genome research.

[13]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[14]  Ming-Yang Kao,et al.  Tight approximability results for test set problems in bioinformatics , 2005, J. Comput. Syst. Sci..

[15]  Samir Khuller,et al.  Greedy Methods , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[16]  D. Du,et al.  Pooling Designs And Nonadaptive Group Testing: Important Tools For Dna Sequencing , 2006 .

[17]  Claus V. Hallwirth,et al.  Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence , 2014, Nucleic acids research.

[18]  Rowena A. Bull,et al.  Sequential Bottlenecks Drive Viral Evolution in Early Acute Hepatitis C Virus Infection , 2011, PLoS pathogens.

[19]  George M. Weinstock,et al.  High-Resolution Quantification of Hepatitis C Virus Genome-Wide Mutation Load and Its Correlation with the Outcome of Peginterferon-Alpha2a and Ribavirin Combination Therapy , 2014, PloS one.

[20]  Osvaldo Zagordi,et al.  Ultradeep Pyrosequencing of Hepatitis C Virus Hypervariable Region 1 in Quasispecies Analysis , 2013, BioMed research international.

[21]  Matthias Cavassini,et al.  Minority quasispecies of drug-resistant HIV-1 that lead to early therapy failure in treatment-naive and -adherent patients. , 2009, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[22]  Vikas Bansal,et al.  A statistical method for the detection of variants from next-generation resequencing of DNA pools , 2010, Bioinform..

[23]  Jeroen Aerssens,et al.  Deep-sequencing analysis of the gene encoding the hepatitis C virus nonstructural 3-4A protease confirms a low prevalence of telaprevir-resistant variants at baseline and the end of the REALIZE study. , 2014, The Journal of infectious diseases.

[24]  Sergei L. Kosakovsky Pond,et al.  The global transmission network of HIV-1. , 2014, The Journal of infectious diseases.

[25]  G. Hannon,et al.  DNA Sudoku--harnessing high-throughput sequencing for multiplexed specimen analysis. , 2009, Genome research.

[26]  Gilberto Vaughan,et al.  Coordinated evolution among hepatitis C virus genomic sites is coupled to host factors and resistance to interferon , 2012, Silico Biol..

[27]  Gianfranco Ciardo,et al.  Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space , 2013, PLoS Comput. Biol..

[28]  Pavel Skums,et al.  Next-generation sequencing reveals large connected networks of intra-host HCV variants , 2014, BMC Genomics.

[29]  Pavel Skums,et al.  Efficient error correction for next-generation sequencing of viral amplicons , 2012, BMC Bioinformatics.

[30]  Pavel Skums,et al.  Drug-resistance of a viral population and its individual intra-host variants during the first 48 hours of therapy , 2014, Clinical pharmacology and therapeutics.

[31]  Atri Rudra,et al.  Accurate Decoding of Pooled Sequenced Data Using Compressed Sensing , 2013, WABI.

[32]  Mark Holodniy,et al.  Results from a Large-Scale Epidemiologic Look-Back Investigation of Improperly Reprocessed Endoscopy Equipment , 2012, Infection Control & Hospital Epidemiology.

[33]  R. Campos,et al.  Intra-host evolution of multiple genotypes of hepatitis C virus in a chronically infected patient with HIV along a 13-year follow-up period. , 2014, Virology.

[34]  Jakub Marecek,et al.  Handbook of Approximation Algorithms and Metaheuristics , 2010, Comput. J..

[35]  Pavel Skums,et al.  Assessments of intra- and inter-host diversity of hepatitis C virus using Next Generation Sequencing and Mass spectrometry , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).