An Evolutionary Model-Based Algorithm for Accurate Phylogenetic Breakpoint Mapping and Subtype Prediction in HIV-1

Genetically diverse pathogens (such as Human Immunodeficiency virus type 1, HIV-1) are frequently stratified into phylogenetically or immunologically defined subtypes for classification purposes. Computational identification of such subtypes is helpful in surveillance, epidemiological analysis and detection of novel variants, e.g., circulating recombinant forms in HIV-1. A number of conceptually and technically different techniques have been proposed for determining the subtype of a query sequence, but there is not a universally optimal approach. We present a model-based phylogenetic method for automatically subtyping an HIV-1 (or other viral or bacterial) sequence, mapping the location of breakpoints and assigning parental sequences in recombinant strains as well as computing confidence levels for the inferred quantities. Our Subtype Classification Using Evolutionary ALgorithms (SCUEAL) procedure is shown to perform very well in a variety of simulation scenarios, runs in parallel when multiple sequences are being screened, and matches or exceeds the performance of existing approaches on typical empirical cases. We applied SCUEAL to all available polymerase (pol) sequences from two large databases, the Stanford Drug Resistance database and the UK HIV Drug Resistance Database. Comparing with subtypes which had previously been assigned revealed that a minor but substantial (≈5%) fraction of pure subtype sequences may in fact be within- or inter-subtype recombinants. A free implementation of SCUEAL is provided as a module for the HyPhy package and the Datamonkey web server. Our method is especially useful when an accurate automatic classification of an unknown strain is desired, and is positioned to complement and extend faster but less accurate methods. Given the increasingly frequent use of HIV subtype information in studies focusing on the effect of subtype on treatment, clinical outcome, pathogenicity and vaccine design, the importance of accurate, robust and extensible subtyping procedures is clear.

[1]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[2]  K. Crandall,et al.  A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. , 2005, AIDS research and human retroviruses.

[3]  M. Papathanasopoulos,et al.  Evolution and Diversity of HIV-1 in Africa – a Review , 2004, Virus Genes.

[4]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[5]  Eugene I. Shakhnovich,et al.  Protein stability imposes limits on organism complexity and speed of molecular evolution , 2007, Proceedings of the National Academy of Sciences.

[6]  Anne-Mieke Vandamme,et al.  Assessment of automated genotyping protocols as tools for surveillance of HIV-1 genetic diversity , 2006, AIDS.

[7]  Simon D W Frost,et al.  A simple hierarchical approach to modeling distributions of substitution rates. , 2005, Molecular biology and evolution.

[8]  J. Carr,et al.  Drug resistance testing provides evidence of the globalization of HIV type 1: a new circulating recombinant form. , 2004, AIDS research and human retroviruses.

[9]  G. Learn,et al.  HIV-1 Nomenclature Proposal , 2000, Science.

[10]  K. Crandall,et al.  The Effect of Recombination on the Accuracy of Phylogeny Estimation , 2002, Journal of Molecular Evolution.

[11]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[12]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[13]  Tatiana A. Tatusova,et al.  A web-based genotyping resource for viral sequences , 2004, Nucleic Acids Res..

[14]  Feng Gao,et al.  Diversity Considerations in HIV-1 Vaccine Selection , 2002, Science.

[15]  M. Peeters,et al.  Near-full-length genome sequencing of divergent African HIV type 1 subtype F viruses leads to the identification of a new HIV type 1 subtype designated K. , 2000, AIDS research and human retroviruses.

[16]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[17]  David R. Anderson,et al.  Model Selection and Multimodel Inference , 2003 .

[18]  N. Wells HIV and AIDS in the United Kingdom , 1988 .

[19]  Larry J. Eshelman,et al.  The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination , 1990, FOGA.

[20]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[21]  A. Harrison,et al.  A statistical model for HIV-1 sequence classification using the subtype analyser (STAR) , 2005, Bioinform..

[22]  D. Richman,et al.  2022 update of the drug resistance mutations in HIV-1. , 2022, Topics in antiviral medicine.

[23]  M. Wainberg,et al.  Discrepancies in assignment of subtype/recombinant forms by genotyping programs for HIV type 1 drug resistance testing may falsely predict superinfection. , 2008, AIDS research and human retroviruses.

[24]  Christine Hogan,et al.  Tracking the Prevalence of Transmitted Antiretroviral Drug-Resistant HIV-1: A Decade of Experience , 2006, Journal of acquired immune deficiency syndromes.

[25]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[26]  David L. Robertson,et al.  An Isolate of Human Immunodeficiency Virus Type 1 Originally Classified as Subtype I Represents a Complex Mosaic Comprising Three Different Group M Subtypes (A, G, and I) , 1998, Journal of Virology.

[27]  Konrad Scheffler,et al.  Robust inference of positive selection from recombining coding sequences , 2006, Bioinform..

[28]  P. Ghys,et al.  Global and regional distribution of HIV-1 genetic subtypes and recombinants in 2004 , 2006, AIDS.

[29]  Noah Kiwanuka,et al.  Among 46 near full length HIV type 1 genome sequences from Rakai District, Uganda, subtype D and AD recombinants predominate. , 2002, AIDS research and human retroviruses.

[30]  K. Crandall,et al.  Evaluation of methods for detecting recombination from DNA sequences: Computer simulations , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Adi Stern,et al.  Evolutionary Modeling of Rate Shifts Reveals Specificity Determinants in HIV-1 Subtypes , 2008, PLoS Comput. Biol..

[32]  M. Thomson,et al.  Identification of a Novel HIV-1 Circulating ADG Intersubtype Recombinant Form (CRF19_cpx) in Cuba , 2005, Journal of acquired immune deficiency syndromes.

[33]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[34]  Isaac Ssewanyana,et al.  HIV subtypes induce distinct profiles of HIV-specific CD8(+) T cell responses. , 2008, AIDS research and human retroviruses.

[35]  Terence Rhodes,et al.  High Rates of Human Immunodeficiency Virus Type 1 Recombination: Near-Random Segregation of Markers One Kilobase Apart in One Round of Viral Replication , 2003, Journal of Virology.

[36]  Ziheng Yang Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A , 2000, Journal of Molecular Evolution.

[37]  L. Kostrikis,et al.  Re-analysis of human immunodeficiency virus type 1 isolates from Cyprus and Greece, initially designated 'subtype I', reveals a unique complex A/G/H/K/? mosaic pattern. , 2001, The Journal of general virology.

[38]  Elizabeth Connick,et al.  Antiretroviral-drug resistance among patients recently infected with HIV. , 2002, The New England journal of medicine.

[39]  Á. Holguín,et al.  Reliability of Rapid Subtyping Tools Compared to That of Phylogenetic Analysis for Characterization of Human Immunodeficiency Virus Type 1 Non-B Subtypes and Recombinant Forms , 2008, Journal of Clinical Microbiology.

[40]  David Posada,et al.  Automated phylogenetic detection of recombination using a genetic algorithm. , 2006, Molecular biology and evolution.

[41]  D. Heckerman,et al.  Extensive Intrasubtype Recombination in South African Human Immunodeficiency Virus Type 1 Subtype C Infections , 2007, Journal of Virology.

[42]  F. E. McCutchan,et al.  In-Depth Analysis of a Heterosexually Acquired Human Immunodeficiency Virus Type 1 Superinfection: Evolution, Temporal Fluctuation, and Intercompartment Dynamics from the Seronegative Window Period through 30 Months Postinfection , 2005, Journal of Virology.

[43]  Thomas Ludwig,et al.  RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees , 2005, Bioinform..

[44]  M. Suchard,et al.  Alignment Uncertainty and Genomic Analysis , 2008, Science.

[45]  Sergei L. Kosakovsky Pond,et al.  Datamonkey: rapid detection of selective pressure on individual sites of codon alignments , 2005, Bioinform..

[46]  Tulio de Oliveira,et al.  An automated genotyping system for analysis of HIV-1 and other microbial sequences , 2005, Bioinform..

[47]  J. Couto-Fernandez,et al.  Identification of two new CRF_BF in Rio de Janeiro State, Brazil. , 2008, AIDS.

[48]  Ming Zhang,et al.  A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes , 2006, BMC Bioinformatics.

[49]  D. Posada,et al.  Unveiling the molecular clock in the presence of recombination. , 2001, Molecular biology and evolution.

[50]  Anne-Mieke Vandamme,et al.  Drug Resistance Mutations for Surveillance of Transmitted HIV-1 Drug-Resistance: 2009 Update , 2009, PloS one.

[51]  Maureen M Goodenow,et al.  An exploratory algorithm to identify intra-host recombinant viral sequences. , 2008, Molecular phylogenetics and evolution.

[52]  Hannah Green,et al.  Effect of HIV-1 subtype on virologic and immunologic response to starting highly active antiretroviral therapy. , 2009, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[53]  Tulio de Oliveira,et al.  The HIV type 1 epidemic in Bulgaria involves multiple subtypes and is sustained by continuous viral inflow from West and East European countries. , 2008, AIDS research and human retroviruses.

[54]  Randy Goebel,et al.  Nucleotide composition string selection in HIV-1 subtyping using whole genomes , 2007, Bioinform..

[55]  N. Sugiura Further analysts of the data by akaike' s information criterion and the finite corrections , 1978 .

[56]  Anne-Mieke Vandamme,et al.  Edinburgh Research Explorer Phylogenetic surveillance of viral genetic diversity and the evolving molecular epidemiology of human immunodeficiency virus type 1 , 2007 .

[57]  B T Foley,et al.  HIV type 1 A/J recombinant with a pronounced pol gene mosaicism. , 2000, AIDS research and human retroviruses.

[58]  R. Shamir,et al.  A fast algorithm for joint reconstruction of ancestral amino acid sequences. , 2000, Molecular biology and evolution.

[59]  Oliver Laeyendecker,et al.  Effect of human immunodeficiency virus Type 1 (HIV-1) subtype on disease progression in persons from Rakai, Uganda, with incident HIV-1 infection. , 2008, The Journal of infectious diseases.

[60]  H. Akaike A new look at the statistical model identification , 1974 .

[61]  Anne-Mieke Vandamme,et al.  Recombination Confounds the Early Evolutionary History of Human Immunodeficiency Virus Type 1: Subtype G Is a Circulating Recombinant Form , 2007, Journal of Virology.

[62]  David C. Nickle,et al.  HIV-Specific Probabilistic Models of Protein Evolution , 2007, PloS one.

[63]  Vladimir N. Minin,et al.  Dual multiple change-point model leads to more accurate recombination detection , 2005, Bioinform..

[64]  D. Burke,et al.  Identification of breakpoints in intergenotypic recombinants of HIV type 1 by bootscanning. , 1995, AIDS research and human retroviruses.

[65]  A. Geretti,et al.  HIV-1 subtypes: epidemiology and significance for HIV management , 2006, Current opinion in infectious diseases.

[66]  D. Posada Evaluation of methods for detecting recombination from DNA sequences: empirical data. , 2002, Molecular biology and evolution.

[67]  J. Parry,et al.  At least five HIV-1 sequence subtypes (A, B, C, D, A/E) occur in England. , 1995, AIDS research and human retroviruses.

[68]  Stéphane Hué,et al.  Genetic analysis reveals the complex structure of HIV-1 transmission within defined risk groups. , 2005, Proceedings of the National Academy of Sciences of the United States of America.