Detection of viral sequence fragments of HIV-1 subfamilies yet unknown

BackgroundMethods of determining whether or not any particular HIV-1 sequence stems - completely or in part - from some unknown HIV-1 subtype are important for the design of vaccines and molecular detection systems, as well as for epidemiological monitoring. Nevertheless, a single algorithm only, the Branching Index (BI), has been developed for this task so far. Moving along the genome of a query sequence in a sliding window, the BI computes a ratio quantifying how closely the query sequence clusters with a subtype clade. In its current version, however, the BI does not provide predicted boundaries of unknown fragments.ResultsWe have developed Unknown Subtype Finder (USF), an algorithm based on a probabilistic model, which automatically determines which parts of an input sequence originate from a subtype yet unknown. The underlying model is based on a simple profile hidden Markov model (pHMM) for each known subtype and an additional pHMM for an unknown subtype. The emission probabilities of the latter are estimated using the emission frequencies of the known subtypes by means of a (position-wise) probabilistic model for the emergence of new subtypes. We have applied USF to SIV and HIV-1 sequences formerly classified as having emerged from an unknown subtype. Moreover, we have evaluated its performance on artificial HIV-1 recombinants and non-recombinant HIV-1 sequences. The results have been compared with the corresponding results of the BI.ConclusionsOur results demonstrate that USF is suitable for detecting segments in HIV-1 sequences stemming from yet unknown subtypes. Comparing USF with the BI shows that our algorithm performs as good as the BI or better.

[1]  Terence Rhodes,et al.  High Rates of Human Immunodeficiency Virus Type 1 Recombination: Near-Random Segregation of Markers One Kilobase Apart in One Round of Viral Replication , 2003, Journal of Virology.

[2]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[3]  Somdatta Sinha,et al.  Using genomic signatures for HIV-1 sub-typing , 2010, BMC Bioinformatics.

[4]  Tulio de Oliveira,et al.  An automated genotyping system for analysis of HIV-1 and other microbial sequences , 2005, Bioinform..

[5]  P. Simmonds,et al.  Recombination in the Genesis and Evolution of Hepatitis B Virus Genotypes , 2005, Journal of Virology.

[6]  David Haussler,et al.  Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology , 1996, Comput. Appl. Biosci..

[7]  Duncan P. Brown,et al.  Automated Protein Subfamily Identification and Classification , 2007, PLoS Comput. Biol..

[8]  Brian T. Foley,et al.  Numbering Positions in HIV Relative to HXB 2 CG , 1999 .

[9]  Thomas Lengauer,et al.  Recco: recombination analysis using cost optimization , 2006, Bioinform..

[10]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[11]  Ming Zhang,et al.  jpHMM at GOBICS: a web server to detect genomic recombinations in HIV-1 , 2006, Nucleic Acids Res..

[12]  T. Leitner,et al.  The Molecular Epidemiology of Human Viruses , 2002, Springer US.

[13]  H. Akaike A new look at the statistical model identification , 1974 .

[14]  Adam P. Arkin,et al.  FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix , 2009, Molecular biology and evolution.

[15]  Ming Zhang,et al.  jpHMM: Improving the reliability of recombination prediction in HIV-1 , 2009, Nucleic Acids Res..

[16]  B. Korber,et al.  Evolutionary and immunological implications of contemporary HIV-1 variation. , 2001, British medical bulletin.

[17]  Ming Zhang,et al.  A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes , 2006, BMC Bioinformatics.

[18]  G. Learn,et al.  HIV-1 Nomenclature Proposal , 2000, Science.

[19]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[20]  William J Bruno,et al.  Comparative analysis of hepatitis C virus phylogenies from coding and non-coding regions: the 5' untranslated region (UTR) fails to classify subtypes , 2006, Virology Journal.

[21]  William J. Bruno,et al.  Classification of hepatitis C virus and human immunodeficiency virus-1 sequences with the branching index. , 2008, The Journal of general virology.

[22]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[23]  J. Carr,et al.  Detection of HIV-1 subtypes, recombinants, and dual infections in east Africa by a multi-region hybridization assay , 2002, AIDS.

[24]  F. Sugauchi,et al.  Characteristics of Hepatitis B Virus Isolates of Genotype G and Their Phylogenetic Differences from the Other Six Genotypes (A through F) , 2002, Journal of Virology.

[25]  Jan Albert,et al.  Characterization of novel recombinant HIV-1 genomes using the branching index. , 2003, Virology.

[26]  P. Sharp,et al.  AIDS as a zoonosis: scientific and public health implications. , 2000, Science.

[27]  BMC Bioinformatics , 2005 .