A penalized regression approach to haplotype reconstruction of viral populations arising in early HIV/SIV infection

Motivation: Next generation sequencing (NGS) has been increasingly applied to characterize viral evolution during HIV and SIV infections. In particular, NGS datasets sampled during the initial months of infection are characterized by relatively low levels of diversity as well as convergent evolution at multiple loci dispersed across the viral genome. Consequently, fully characterizing viral evolution from NGS datasets requires haplotype reconstruction across large regions of the viral genome. Existing haplotype reconstruction algorithms have not been developed with the particular characteristics of early HIV/SIV infection in mind, raising the possibility that better performance could be achieved through a specifically designed algorithm. Results: Here, we introduce a haplotype reconstruction algorithm, RegressHaplo, specifically designed for low diversity and convergent evolution regimes. The algorithm uses a penalized regression that balances a data fitting term with a penalty term that encourages solutions with few haplotypes. The regression covariates are a large set of potential haplotypes and fitting the regression is made computationally feasible by the low diversity setting. Using simulated and in vivo datasets, we compare RegressHaplo to PredictHaplo and QuRe, two existing haplotype reconstruction algorithms. RegressHaplo performs better than these algorithms on simulated datasets with relatively low diversity levels. We suggest RegressHaplo as a novel tool for the investigation of early infection HIV/SIV datasets and, more generally, low diversity viral NGS datasets. Contact: sr286@georgetown.edu Availability and Implementation: https://github.com/SLeviyang/RegressHaplo

[1]  Igor Griva,et al.  Fast projected gradient method for support vector machines , 2016, Optimization and Engineering.

[2]  Nebojsa Jojic,et al.  Population Sequencing Using Short Reads: HIV as a Case Study , 2008, Pacific Symposium on Biocomputing.

[3]  Austin L. Hughes,et al.  Whole-Genome Characterization of Human and Simian Immunodeficiency Virus Intrahost Diversity by Ultradeep Pyrosequencing , 2010, Journal of Virology.

[4]  Christopher J. Miller,et al.  Comparative Characterization of Transfection- and Infection-Derived Simian Immunodeficiency Virus Challenge Stocks for In Vivo Nonhuman Primate Studies , 2013, Journal of Virology.

[5]  Volker Roth,et al.  HIV Haplotype Inference Using a Propagating Dirichlet Process Mixture Model , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  J. Coffin,et al.  Evolution of Human Immunodeficiency Virus Under Selection and Weak Recombination , 2005, Genetics.

[7]  Peter Dayan,et al.  Monte Carlo Planning Method Estimates Planning Horizons during Interactive Social Exchange , 2015, PLoS Comput. Biol..

[8]  Austin Hughes,et al.  Ultradeep Pyrosequencing Detects Complex Patterns of CD8+ T-Lymphocyte Escape in Simian Immunodeficiency Virus-Infected Macaques , 2009, Journal of Virology.

[9]  John Wakeley,et al.  Loss and Recovery of Genetic Diversity in Adapting Populations of HIV , 2013, PLoS genetics.

[10]  B. Haynes,et al.  Acute HIV-1 Infection. , 2011, The New England journal of medicine.

[11]  Niko Beerenwinkel,et al.  Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies , 2010, Nucleic acids research.

[12]  T. Hatziioannou,et al.  Animal models for HIV/AIDS research , 2012, Nature Reviews Microbiology.

[13]  D. Richman,et al.  Rapid evolution of the neutralizing antibody response to HIV type 1 infection , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[15]  Huldrych F. Günthard,et al.  Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection , 2012, PLoS pathogens.

[16]  Mattia C. F. Prosperi,et al.  QuRe: software for viral quasispecies reconstruction from next-generation sequencing data , 2012, Bioinform..

[17]  Lior Pachter,et al.  Viral Population Estimation Using Pyrosequencing , 2007, PLoS Comput. Biol..

[18]  C. Quince,et al.  Accurate determination of microbial diversity from 454 pyrosequencing data , 2009, Nature Methods.

[19]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[20]  Olufunmilayo I. Olopade,et al.  DNA Glycosylases Involved in Base Excision Repair May Be Associated with Cancer Risk in BRCA1 and BRCA2 Mutation Carriers , 2014, PLoS genetics.

[21]  Christopher Quince,et al.  Benchmarking of viral haplotype reconstruction programmes: an overview of the capacities and limitations of currently available programmes , 2014, Briefings Bioinform..

[22]  Alan S. Perelson,et al.  The first T cell response to transmitted/founder virus contributes to the control of acute viremia in HIV-1 infection , 2009, The Journal of experimental medicine.

[23]  Alan S. Perelson,et al.  Transmission of Single HIV-1 Genomes and Dynamics of Early Immune Escape Revealed by Ultra-Deep Sequencing , 2010, PloS one.

[24]  J. Martínez,et al.  Natural Antibiotic Resistance and Contamination by Antibiotic Resistance Determinants: The Two Ages in the Evolution of Resistance to Antimicrobials , 2012, Front. Microbio..

[25]  Volker Roth,et al.  Probabilistic Inference of Viral Quasispecies Subject to Recombination , 2013, J. Comput. Biol..

[26]  Alexander Schönhuth,et al.  Viral Quasispecies Assembly via Maximal Clique Enumeration , 2014, PLoS Comput. Biol..

[27]  Michael Gale,et al.  Innate immunity against HIV-1 infection , 2015, Nature Immunology.

[28]  Rob J de Boer,et al.  Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants , 2013, Retrovirology.

[29]  Ion I. Mandoiu,et al.  Inferring viral quasispecies spectra from 454 pyrosequencing reads , 2011, BMC Bioinformatics.

[30]  K. Metzner,et al.  Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data , 2012, Front. Microbio..

[31]  Hui Li,et al.  Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection , 2008, Proceedings of the National Academy of Sciences.

[32]  Piotr Berman,et al.  HCV Quasispecies Assembly Using Network Flows , 2008, ISBRA.

[33]  Giovanni Ulivi,et al.  Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing , 2011, BMC Bioinformatics.

[34]  Karl J. Friston Hierarchical Models in the Brain , 2008, PLoS Comput. Biol..

[35]  Ion I. Mandoiu,et al.  Viral quasispecies reconstruction from amplicon 454 pyrosequencing reads , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[36]  Niko Beerenwinkel,et al.  Ultra-deep sequencing for the analysis of viral populations. , 2011, Current opinion in virology.

[37]  Ronald S Veazey,et al.  A macaque model of HIV-1 infection , 2009, Proceedings of the National Academy of Sciences.

[38]  Nicholas Eriksson,et al.  ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data , 2011, BMC Bioinformatics.

[39]  M. Gerstung,et al.  Reliable detection of subclonal single-nucleotide variants in tumour cell populations , 2012, Nature Communications.

[40]  Graham J. Etherington,et al.  Genome Analyses of an Aggressive and Invasive Lineage of the Irish Potato Famine Pathogen , 2012, PLoS pathogens.

[41]  Vitaly V. Ganusov,et al.  Broad CTL Response in Early HIV Infection Drives Multiple Concurrent CTL Escapes , 2015, PLoS Comput. Biol..

[42]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..