POPULATION SEQUENCING FROM CHROMATOGRAM DATA

, gene) from a sample. Traditionally, this is achieved chemically, for example through the use of specific primer sequences. However, it is possible that multiple related species are picked up with the same primer. This is especially problematic in sequencing RNA or proviral DNA, when the virus in question is highly variable and each individual is infected with a different swarm of viral strains. In the case of HIV, when dominant sequences in the population differ by one or more insertions/deletions, standard sequencing techniques fail to recover any component strains satisfactorily. For example, regions of HIV such as the envelope proteins (which are subjected to strong selective pressures from host defenses and result in multiple mutations, insertions, and deletions), most chromatograms obtained by DNA sequencers have unusable sections. We show that chromatograms of mixed sequences, such as the segment shown in Figure 1, can be used to accurately infer the individual strains thus eliminating the need for additional sequencing steps (

[1]  J. M. Prober,et al.  A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. , 1987, Science.

[2]  Ora Schueler-Furman,et al.  Learning MHC I - peptide binding , 2006, ISMB.

[3]  Simon Mallal,et al.  The Western Australian HIV cohort study , 1998 .

[4]  Simon J. Godsill,et al.  Bayesian models for DNA sequencing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  C. Moore,et al.  Evidence of HIV-1 Adaptation to HLA-Restricted Immune Responses at a Population Level , 2002, Science.

[6]  S. Mallal,et al.  The Western Australian HIV Cohort Study, Perth, Australia. , 1998, Journal of acquired immune deficiency syndromes and human retrovirology : official publication of the International Retrovirology Association.

[7]  H. Swerdlow,et al.  Capillary gel electrophoresis for rapid, high resolution DNA sequencing. , 1990, Nucleic acids research.

[8]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[9]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[10]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[11]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.