HIV-1 Full-Genome Phylogenetics of Generalized Epidemics in Sub-Saharan Africa: Impact of Missing Nucleotide Characters in Next-Generation Sequences

Abstract To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the “Phylogenetics and Networks for Generalised HIV Epidemics in Africa” consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.

[1]  N. Nakasujja,et al.  Effect of HIV Subtype and Antiretroviral Therapy on HIV-Associated Neurocognitive Disorder Stage in Rakai, Uganda , 2019, Journal of acquired immune deficiency syndromes.

[2]  Ann M. Dennis,et al.  Inferring HIV-1 transmission networks and sources of epidemic spread in Africa with deep-sequence phylogenetic analysis , 2019, Nature Communications.

[3]  Phuc T. Pham,et al.  Next-generation sequencing of HIV-1 single genome amplicons , 2019, Biomolecular detection and quantification.

[4]  Andrew Rambaut,et al.  HIV Sequence Compendium 2018 , 2018 .

[5]  J. Herbeck,et al.  Genetic Cluster Analysis for HIV Prevention , 2018, Current HIV/AIDS Reports.

[6]  Astrid Gall,et al.  Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver , 2018, Virus evolution.

[7]  A. Poon,et al.  Promises and pitfalls of Illumina sequencing for HIV resistance genotyping. , 2017, Virus research.

[8]  Astrid Gall,et al.  From clinical sample to complete genome: Comparing methods for the extraction of HIV-1 RNA for high-throughput deep sequencing. , 2017, Virus research.

[9]  Tanja Stadler,et al.  Phylogenetic Tools for Generalized HIV-1 Epidemics: Findings from the PANGEA-HIV Methods Comparison , 2016, Molecular biology and evolution.

[10]  Ann M. Dennis,et al.  Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic , 2016, Scientific Reports.

[11]  J. Fellay,et al.  Easy and Accurate Reconstruction of Whole HIV Genomes from Short-Read Sequence Data , 2016, bioRxiv.

[12]  David A. Rasmussen,et al.  Origin, imports and exports of HIV-1 subtype C in South Africa: A historical perspective. , 2016, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[13]  S. Moore,et al.  Heterogeneity of the HIV epidemic in agrarian, trading, and fishing communities in Rakai, Uganda: an observational epidemiological study. , 2016, The lancet. HIV.

[14]  C. Fraser,et al.  HIV-1 Sequence Data Coverage in Central East Africa from 1959 to 2013. , 2016, AIDS research and human retroviruses.

[15]  Olga Chernomor,et al.  Terrace Aware Data Structure for Phylogenomic Inference from Supermatrices , 2016, Systematic biology.

[16]  Art F. Y. Poon,et al.  Near real-time monitoring of HIV transmission hotspots from routine HIV genotyping: an implementation case study , 2016, The lancet. HIV.

[17]  C. Fraser,et al.  Sources of HIV infection among men having sex with men and implications for prevention , 2016, Science Translational Medicine.

[18]  Olivier Gascuel,et al.  Fast Dating Using Least-Squares Criteria and Algorithms , 2015, Systematic biology.

[19]  M. Essex,et al.  Phylodynamic analysis of HIV sub-epidemics in Mochudi, Botswana. , 2015, Epidemics.

[20]  David Bonsall,et al.  ve-SEQ: Robust, unbiased enrichment for streamlined detection and whole-genome sequencing of HCV and other highly diverse pathogens , 2015, F1000Research.

[21]  M. Kendall,et al.  Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolution , 2015, bioRxiv.

[22]  Jan Albert,et al.  Population genomics of intrapatient HIV-1 evolution , 2015, eLife.

[23]  P. Kaleebu,et al.  Analysis of the history and spread of HIV-1 in Uganda using phylodynamics , 2015, The Journal of general virology.

[24]  M. Essex,et al.  Long-Range HIV Genotyping Using Viral RNA and Proviral DNA for Analysis of HIV Drug Resistance and HIV Clustering , 2015, Journal of Clinical Microbiology.

[25]  Matthieu Muffato,et al.  Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference , 2015, Systematic biology.

[26]  V. DeGruttola,et al.  Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering. , 2015, AIDS research and human retroviruses.

[27]  C. Fraser,et al.  PANGEA-HIV: phylogenetics for generalised epidemics in Africa. , 2015, Lancet. Infectious Diseases (Print).

[28]  Astrid Gall,et al.  IVA: accurate de novo assembly of RNA virus genomes , 2015, Bioinform..

[29]  S. Lockman,et al.  Estimated age and gender profile of individuals missed by a home-based HIV testing and counselling campaign in a Botswana community , 2015, Journal of the International AIDS Society.

[30]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[31]  Glenn Lawyer,et al.  COMET: adaptive context-based modeling for ultrafast HIV-1 subtype identification , 2014, Nucleic acids research.

[32]  Xavier Didelot,et al.  Bayesian Inference of Infectious Disease Transmission from Whole-Genome Sequence Data , 2014, Molecular biology and evolution.

[33]  D. Cummings,et al.  The Role of Viral Introductions in Sustaining Community-Based HIV Epidemics in Rural Uganda: Evidence from Spatial Clustering, Phylogenetics, and Egocentric Transmission Models , 2014, PLoS medicine.

[34]  P. Kellam,et al.  Complete Genome Sequence of the WHO International Standard for HIV-1 RNA Determined by Deep Sequencing , 2014, Genome Announcements.

[35]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[36]  Sikhulile Moyo,et al.  Phylogenetic Relatedness of Circulating HIV-1C Variants in Mochudi, Botswana , 2013, PloS one.

[37]  Erik M. Volz,et al.  HIV-1 Transmission during Early Infection in Men Who Have Sex with Men: A Phylodynamic Analysis , 2013, PLoS medicine.

[38]  T. Quinn,et al.  Frequency and implications of HIV superinfection. , 2013, The Lancet. Infectious diseases.

[39]  Daniel J. Wilson,et al.  A Modified RNA-Seq Approach for Whole Genome Sequencing of RNA Viruses from Faecal and Blood Samples , 2013, PLoS ONE.

[40]  Janet Seeley,et al.  The general population cohort in rural south-western Uganda: a platform for communicable and non-communicable disease studies. , 2013, International journal of epidemiology.

[41]  Kendra N. Pesko,et al.  Complete viral RNA genome sequencing of ultra-low copy samples by sequence-independent amplification , 2012, Nucleic acids research.

[42]  Thomas Mailund,et al.  Algorithms for Computing the Triplet and Quartet Distances for Binary and General Trees , 2013 .

[43]  H. Philippe,et al.  Impact of missing data on phylogenies inferred from empirical phylogenomic data sets. , 2013, Molecular biology and evolution.

[44]  Samuel Alizon,et al.  Within-host and between-host evolutionary rates across the HIV-1 genome , 2013, Retrovirology.

[45]  B. Berkhout,et al.  The biased nucleotide composition of the HIV genome: a constant factor in a highly variable virus , 2012, Retrovirology.

[46]  Astrid Gall,et al.  Universal Amplification, Next-Generation Sequencing, and Assembly of HIV-1 Genomes , 2012, Journal of Clinical Microbiology.

[47]  A. Kamali,et al.  High HIV Incidence and Socio-Behavioral Risk Patterns in Fishing Communities on the Shores of Lake Victoria, Uganda , 2012, Sexually transmitted diseases.

[48]  Huldrych F. Günthard,et al.  Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection , 2012, PLoS pathogens.

[49]  D. Pillay,et al.  HIV type 1 in a rural coastal town in Kenya shows multiple introductions with many subtypes and much recombination. , 2012, AIDS research and human retroviruses.

[50]  Sergei L. Kosakovsky Pond,et al.  Inconsistencies in estimating the age of HIV-1 subtypes due to heterotachy. , 2012, Molecular biology and evolution.

[51]  Lei Wang,et al.  Analysis of genetic linkage of HIV from couples enrolled in the HIV Prevention Trials Network 052 trial. , 2011, The Journal of infectious diseases.

[52]  R. Paredes,et al.  Deep Molecular Characterization of HIV-1 Dynamics under Suppressive HAART , 2011, PLoS pathogens.

[53]  J. Wiens,et al.  Missing data in phylogenetic analysis: reconciling results from simulations and empirical data. , 2011, Systematic biology.

[54]  A. Oster,et al.  HIV risk among young African American men who have sex with men: a case-control study in Mississippi. , 2011, American journal of public health.

[55]  A. Kamali,et al.  HIV and syphilis prevalence and associated risk factors among fishing communities of Lake Victoria, Uganda , 2011, Sexually Transmitted Infections.

[56]  Suzanna C. Francis,et al.  HIV and Other Sexually Transmitted Infections in a Cohort of Women Involved in High-Risk Sexual Behavior in Kampala, Uganda , 2011, Sexually transmitted diseases.

[57]  J. Mullins,et al.  Viral Linkage in HIV-1 Seroconverters and Their Partners in an HIV-1 Prevention Clinical Trial , 2011, PloS one.

[58]  H. Philippe,et al.  Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough , 2011, PLoS biology.

[59]  Alan S. Perelson,et al.  Transmission of Single HIV-1 Genomes and Dynamics of Early Immune Escape Revealed by Ultra-Deep Sequencing , 2010, PloS one.

[60]  J. Lundeberg,et al.  Dynamics of HIV-1 Quasispecies during Antiviral Treatment Dissected Using Ultra-Deep Pyrosequencing , 2010, PloS one.

[61]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[62]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[63]  J. Shultz,et al.  Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences , 2010, Nature.

[64]  Andrew Rambaut,et al.  Evolutionary analysis of the dynamics of viral infectious disease , 2009, Nature Reviews Genetics.

[65]  R. Rigby,et al.  Generalized Additive Models for Location Scale and Shape (GAMLSS) in R , 2007 .

[66]  Michel Roger,et al.  High rates of forward transmission events after acute/early HIV-1 infection. , 2007, The Journal of infectious diseases.

[67]  Hervé Philippe,et al.  Lack of resolution in the animal phylogeny: closely spaced cladogeneses or undetected systematic errors? , 2007, Molecular biology and evolution.

[68]  F. Delsuc,et al.  Phylogenomics: the beginning of incongruence? , 2006, Trends in genetics : TIG.

[69]  F. Delsuc,et al.  Phylogenomics and the reconstruction of the tree of life , 2005, Nature Reviews Genetics.

[70]  P. Holland,et al.  Phylogenomics of eukaryotes: impact of missing data on large alignments. , 2004, Molecular biology and evolution.

[71]  D. Penny Inferring Phylogenies.—Joseph Felsenstein. 2003. Sinauer Associates, Sunderland, Massachusetts. , 2004 .

[72]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[73]  J. Wiens,et al.  Missing data, incomplete taxa, and phylogenetic accuracy. , 2003, Systematic biology.

[74]  J. Ohn,et al.  Does Adding Characters with Missing Data Increase or Decrease Phylogenetic Accuracy ? , 2003 .

[75]  Derrick J. Zwickl,et al.  Increased taxon sampling greatly reduces phylogenetic error. , 2002, Systematic biology.

[76]  Brian T. Foley,et al.  HIV Sequence Compendium 2018 , 2010 .

[77]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[78]  M. Steel,et al.  Distributions of Tree Comparison Metrics—Some New Results , 1993 .