A Markov Chain Approach to Reconstruction of Long Haplotypes

Haplotypes are important for association based gene mapping, but there are no practical laboratory methods for obtaining them directly from DNA samples. We propose simple Markov models for reconstruction of haplotypes for a given sample of multilocus genotypes. The models are aimed specifically for long marker maps, where linkage disequilibrium between markers may vary and be relatively weak. Such maps are ultimately used in chromosome or genome-wide association studies. Haplotype reconstruction with standard Markov chains is based on linkage disequilibrium (LD) between neighboring markers. Markov chains of higher order can capture LD in a neighborhood of a given size. We introduce a more flexible and robust model, MC-VL, which is based on a Markov chain of variable order. Experimental validation of the Markov chain methods on both a wide range of simulated data and real data shows that they clearly out perform previous methods on genetically long marker maps and are highly competitive with short maps, too. MC-VL performs well across different data sets and settings while avoiding the problem of manually choosing an appropriate order for the Markov chain, and it has low computational complexity.

[1]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[2]  Dana Ron,et al.  The Power of Amnesia , 1993, NIPS.

[3]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[4]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[5]  L. Kruglyak Prospects for whole-genome linkage disequilibrium mapping of common disease genes , 1999, Nature Genetics.

[6]  Dan Gusfield,et al.  Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms , 2001, J. Comput. Biol..

[7]  M. Xiong,et al.  Haplotypes vs single marker linkage disequilibrium tests: what do we gain? , 2001, European Journal of Human Genetics.

[8]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[9]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[10]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[11]  A. Chakravarti,et al.  Haplotype inference in random population samples. , 2002, American journal of human genetics.

[12]  Zhaohui S. Qin,et al.  Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[13]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[14]  George Karypis,et al.  Evaluation of Techniques for Classifying Biological Sequences , 2002, PAKDD.

[15]  Richard M. Karp,et al.  Large scale reconstruction of haplotypes from genotype data , 2003, RECOMB '03.