Population Sequencing Using Short Reads: HIV as a Case Study

Despite many drawbacks, traditional sequencing technologies have proven to be invaluable in modern medical research, even when the targeted genomes are highly variable. While it is often known in such cases that multiple slightly different sequences are present in the analyzed sample in concentrations that vary dramatically, the traditional techniques typically allow only the most dominant strain to be extracted from a single chromatogram. These limitations made some research directions rather difficult to pursue. For example, the analysis of HIV evolution (including the emergence of drug resistance) in a single patient is expected to benefit from a comprehensive catalog of the patient's HIV population. In this paper, we show how the new generation of sequencing technologies, based on high throughput of short reads, can be used to link site variants and reconstruct multiple full strains of the targeted gene, including those of low concentration in the sample. Our algorithm is based on a generative model of the sequencing process, and uses a tailored probabilistic inference and learning procedure to fit the model to the obtained reads.

[1]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[2]  L. Tippett,et al.  Applied Statistics. A Journal of the Royal Statistical Society , 1952 .

[3]  B. Bainbridge,et al.  Genetics , 1981, Experientia.

[4]  유영제,et al.  Biotechnology에서 배우는 교훈 , 2006 .

[5]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[6]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[7]  Adeeba Kamarulzaman,et al.  AIDS Res Hum Retroviruses , 2006 .

[8]  Nebojsa Jojic,et al.  POPULATION SEQUENCING FROM CHROMATOGRAM DATA , 2007 .