Whole genome composition distance for HIV-1 genotyping.

Existing HIV-1 genotyping systems require a computationally expensive phase of multiple sequence alignments and the alignments must have a sufficiently high quality for accurate genotyping. This is particularly a challenge when the number of strains is large. Here we propose a whole genome composition distance (WGCD) to measure the evolutionary closeness between two HIV-1 whole genomic RNA sequences, and use that measure to develop an HIV-1 genotyping system. Such a WGCD-based genotyping system avoids multiple sequence alignments and does not require any pre-knowledge about the evolutionary rates. Experimental results showed that the system is able to correctly identify the known subtypes, sub-subtypes, and individual circulating recombinant forms.

[1]  Tatiana A. Tatusova,et al.  A web-based genotyping resource for viral sequences , 2004, Nucleic Acids Res..

[2]  David L. Robertson,et al.  HIV-1 nomenclature proposal: a reference guide to HIV-1 classification. , 2000 .

[3]  E. Herniou,et al.  Use of Whole Genome Sequence Data To Infer Baculovirus Phylogeny , 2001, Journal of Virology.

[4]  Vittorio Loreto,et al.  Language trees and zipping. , 2002, Physical review letters.

[5]  Xin Chen,et al.  A compression algorithm for DNA sequences and its applications in genome comparison , 2000, RECOMB '00.

[6]  M Dauchet,et al.  Compression and genetic sequence analysis. , 1996, Biochimie.

[7]  G. Learn,et al.  HIV-1 Nomenclature Proposal , 2000, Science.

[8]  B. Snel,et al.  Genomes in flux: the evolution of archaeal and proteobacterial gene content. , 2002, Genome research.

[9]  Ji Qi,et al.  Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[10]  Steve Baker,et al.  Integrated gene and species phylogenies from unaligned whole genome protein sequences , 2002, Bioinform..

[11]  J. Leader,et al.  A comprehensive vertebrate phylogeny using vector representations of protein sequences from whole genomes. , 2002, Molecular biology and evolution.

[12]  Z. Xuan,et al.  Phylogeny Based on Whole Genome as inferred from Complete Information Set Analysis , 2002, Journal of biological physics.

[13]  Volker Brendel,et al.  Identification of Biased Amino Acid Substitution Patterns in Human Immunodeficiency Virus Type 1 Isolates from Patients Treated with Protease Inhibitors , 1999, Journal of Virology.

[14]  Tulio de Oliveira,et al.  An automated genotyping system for analysis of HIV-1 and other microbial sequences , 2005, Bioinform..

[15]  A. Harrison,et al.  A statistical model for HIV-1 sequence classification using the subtype analyser (STAR) , 2005, Bioinform..

[16]  Stéphane Grumbach,et al.  Compression of DNA sequences , 1993, [Proceedings] DCC `93: Data Compression Conference.

[17]  W Preiser,et al.  Variety of interpretation systems for human immunodeficiency virus type 1 genotyping: confirmatory information or additional confusion? , 2003, Current drug targets. Infectious disorders.

[18]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[19]  S. Popper,et al.  Lower human immunodeficiency virus (HIV) type 2 viral load reflects the difference in pathogenicity of HIV-1 and HIV-2. , 1999, The Journal of infectious diseases.

[20]  B. Snel,et al.  Genome phylogeny based on gene content , 1999, Nature Genetics.

[21]  S. Fitz-Gibbon,et al.  Using Homolog Groups to Create a Whole-Genomic Tree of Free-Living Organisms: An Update , 2002, Journal of Molecular Evolution.

[22]  S. Karlin,et al.  Dinucleotide relative abundance extremes: a genomic signature. , 1995, Trends in genetics : TIG.