Using genomic signatures for HIV-1 sub-typing

BackgroundHuman Immunodeficiency Virus type 1 (HIV-1), the causative agent of Acquired Immune Deficiency Syndrome (AIDS), exhibits very high genetic diversity with different variants or subtypes prevalent in different parts of the world. Proper classification of the HIV-1 subtypes, displaying differential infectivity, plays a major role in monitoring the epidemic and is also a critical component for effective treatment strategy. The existing methods to classify HIV-1 sequence subtypes, based on phylogenetic analysis focusing only on specific genes/regions, have shown inconsistencies as they lack the capability to analyse whole genome variations. Several isolates are left unclassified due to unresolved sub-typing. It is apparent that classification of subtypes based on complete genome sequences, rather than sub-genomic regions, is a more robust and comprehensive approach to address genome-wide heterogeneity. However, no simple methodology exists that directly computes HIV-1 subtype from the complete genome sequence.ResultsWe use Chaos Game Representation (CGR) as an approach to identify the distinctive genomic signature associated with the DNA sequence organisation in different HIV-1 subtypes. We first analysed the effect of nucleotide word lengths (k = 2 to 8) on whole genomes of the HIV-1 M group sequences, and found the optimum word length of k = 6, that could classify HIV-1 subtypes based on a Test sequence set. Using the optimised word length, we then showed accurate classification of the HIV-1 subtypes from both the Reference Set sequences and from all available sequences in the database. Finally, we applied the approach to cluster the five unclassified HIV-1 sequences from Africa and Europe, and predict their possible subtypes.ConclusionWe propose a genomic signature-based approach, using CGR with suitable word length optimisation, which can be applied to classify intra-species variations, and apply it to the complex problem of HIV-1 subtype classification. We demonstrate that CGR is a simple and computationally less intensive method that not only accurately segregates the HIV-1 subtype and sub-subtypes, but also aid in the classification of the unclassified sequences. We hope that it will be useful in subtype annotation of the newly sequenced HIV-1 genomes.

[1]  T. Leitner,et al.  Yet another subtype of HIV type 1? , 1995, AIDS research and human retroviruses.

[2]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[3]  David L. Robertson,et al.  An Isolate of Human Immunodeficiency Virus Type 1 Originally Classified as Subtype I Represents a Complex Mosaic Comprising Three Different Group M Subtypes (A, G, and I) , 1998, Journal of Virology.

[4]  Sandra Fillebrown,et al.  The MathWorks' MATLAB , 1996 .

[5]  B. Berkhout,et al.  Characterization of an HIV-1 group M variant that is distinct from the known subtypes. , 2007, AIDS research and human retroviruses.

[6]  B. Korber,et al.  HIV sequence compendium 2002 , 2002 .

[7]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[8]  Brian T. Foley,et al.  HIV-1 Subtype and Circulating Recombinant Form (CRF) Reference Sequences, 2005 , 2005 .

[9]  Peter Piot,et al.  Joint United Nations Program on HIV/AIDS (UNAIDS) , 1997 .

[10]  H. Doi Importance of purine and pyrimidine content of local nucleotide sequences (six bases long) for evolution of the human immunodeficiency virus type 1. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[11]  E. Domingo,et al.  RNA virus mutations and fitness for survival. , 1997, Annual review of microbiology.

[12]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[13]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[14]  A. Harrison,et al.  A statistical model for HIV-1 sequence classification using the subtype analyser (STAR) , 2005, Bioinform..

[15]  B. Korber,et al.  The emergence of simian/human immunodeficiency viruses. , 1992, AIDS research and human retroviruses.

[16]  Tatiana A. Tatusova,et al.  A web-based genotyping resource for viral sequences , 2004, Nucleic Acids Res..

[17]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[18]  D. Burke,et al.  Phylogenetic analysis of gag genes from 70 international HIV‐1 isolates provides evidence for multiple genotypes , 1993, AIDS.

[19]  Y. Takebe,et al.  Global molecular epidemiology of HIV: understanding the genesis of AIDS pandemic. , 2008, Advances in pharmacology.

[20]  M. Peeters,et al.  Near-full-length genome sequencing of divergent African HIV type 1 subtype F viruses leads to the identification of a new HIV type 1 subtype designated K. , 2000, AIDS research and human retroviruses.

[21]  N. Goldman,et al.  Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representations of DNA sequences. , 1993, Nucleic acids research.

[22]  P. Piot,et al.  Genetic and phylogenetic analysis of env subtypes G and H in central Africa. , 1994, AIDS research and human retroviruses.

[23]  Eric J. Arts,et al.  Changes in Human Immunodeficiency Virus Type 1 Fitness and Genetic Diversity during Disease Progression , 2005, Journal of Virology.

[24]  Sidaction Expanding access to HIV treatment through community-based organizations : a joint publication of Sidaction, the Joint United Nations Programme on HIV/AIDS (UNAIDS) and the World Health Organization (WHO) , 2005 .

[25]  D. Ho,et al.  Genetic analysis of human immunodeficiency virus type 1 strains from patients in Cyprus: identification of a new subtype designated subtype I , 1995, Journal of virology.

[26]  Yong Gao,et al.  HIV diversity, recombination and disease progression: how does fitness "fit" into the puzzle? , 2007, AIDS reviews.

[27]  Gkikas Magiorkinis,et al.  Increasing prevalence of HIV-1 subtype A in Greece: estimating epidemic history and origin. , 2007, The Journal of infectious diseases.

[28]  L. Loeb,et al.  Fidelity of HIV-1 reverse transcriptase. , 1988, Science.

[29]  Jonas S. Almeida,et al.  Analysis of genomic sequences by Chaos Game Representation , 2001, Bioinform..

[30]  G. Learn,et al.  HIV-1 Nomenclature Proposal , 2000, Science.

[31]  Lila Kari,et al.  The spectrum of genomic signatures: from dinucleotides to chaos game representation. , 2005, Gene.

[32]  Gilcher Ro Human retroviruses and AIDS. , 1988 .