Measuring and partitioning the high‐order linkage disequilibrium by multiple order Markov chains

A map of the background levels of disequilibrium between nearby markers can be useful for association mapping studies. In order to assess the background levels of linkage disequilibrium (LD), multilocus LD measures are more advantageous than pairwise LD measures because the combined analysis of pairwise LD measures is not adequate to detect simultaneous allele associations among multiple markers. Various multilocus LD measures based on haplotypes have been proposed. However, most of these measures provide a single index of association among multiple markers and does not reveal the complex patterns and different levels of LD structure. In this paper, we employ non‐homogeneous, multiple order Markov Chain models as a statistical framework to measure and partition the LD among multiple markers into components due to different orders of marker associations. Using a sliding window of multiple markers on phased haplotype data, we compute corresponding likelihoods for different Markov Chain (MC) orders in each window. The log‐likelihood difference between the lowest MC order model (MC0) and the highest MC order model in each window is used as a measure of the total LD or the overall deviation from the gametic equilibrium for the window. Then, we partition the total LD into lower order disequilibria and estimate the effects from two‐, three‐, and higher order disequilibria. The relationship between different orders of LD and the log‐likelihood difference involving two different orders of MC models are explored. By applying our method to the phased haplotype data in the ENCODE regions of the HapMap project, we are able to identify high/low multilocus LD regions. Our results reveal that the most LD in the HapMap data is attributed to the LD between adjacent pairs of markers across the whole region. LD between adjacent pairs of markers appears to be more significant in high multilocus LD regions than in low multilocus LD regions. We also find that as the multilocus total LD increases, the effects of high‐order LD tends to get weaker due to the lack of observed multilocus haplotypes. The overall estimates of first, second, third, and fourth order LD across the ENCODE regions are 64, 23, 9, and 3%. Genet. Epidemiol. 2008. © 2008 Wiley‐Liss, Inc.

[1]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[2]  P. Smouse Likelihood analysis of recombinational disequilibrium in multiple-locus gametic frequencies. , 1974, Genetics.

[3]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[4]  G Thomson,et al.  Three-locus systems impose additional constraints on pairwise disequilibria. , 1991, Genetics.

[5]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[6]  S. Tishkoff,et al.  Molecular haplotyping of genetic markers 10 kb apart by allele-specific long-range PCR. , 1996, Nucleic acids research.

[7]  A. Long,et al.  The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. , 1999, Genome research.

[8]  M. Xiong,et al.  Haplotypes vs single marker linkage disequilibrium tests: what do we gain? , 2001, European Journal of Human Genetics.

[9]  N. Kaplan,et al.  Issues concerning association studies for fine mapping a susceptibility gene for a complex disease , 2001, Genetic epidemiology.

[10]  M. Boehnke,et al.  Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies , 2001, Nature Genetics.

[11]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[12]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[13]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[14]  Chiara Sabatti,et al.  Homozygosity and linkage disequilibrium. , 2002, Genetics.

[15]  K. Rohde,et al.  Entropy as a Measure for Linkage Disequilibrium over Multilocus Haplotype Blocks , 2003, Human Heredity.

[16]  Kun Zhang,et al.  HaploBlockFinder: Haplotype Block Analyses , 2003, Bioinform..

[17]  Hongyu Zhao,et al.  The impact of sample size and marker selection on the study of haplotype structures , 2004, Human Genomics.

[18]  P. Deloukas,et al.  The impact of SNP density on fine-scale patterns of linkage disequilibrium. , 2004, Human molecular genetics.

[19]  D. Zaykin,et al.  Effect of Two- and Three-Locus Linkage Disequilibrium on the Power to Detect Marker/Phenotype Associations , 2004, Genetics.

[20]  Sheng Feng,et al.  Statistical studies of genomics data. , 2004 .

[21]  Jakob C. Mueller,et al.  Linkage disequilibrium for different scales and applications , 2004, Briefings Bioinform..

[22]  Michael Nothnagel,et al.  The definition of multilocus haplotype blocks and common diseases , 2005 .

[23]  Zaher Dawy,et al.  An approximation to the distribution of finite sample size mutual information estimates , 2005, IEEE International Conference on Communications, 2005. ICC 2005. 2005.

[24]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[25]  Geoffrey B. Nilsen,et al.  Whole-Genome Patterns of Common DNA Variation in Three Human Populations , 2005, Science.

[26]  Shili Lin,et al.  Multilocus LD measure and tagging SNP selection with generalized mutual information , 2005, Genetic epidemiology.

[27]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[28]  Michael Nothnagel,et al.  The effect of single-nucleotide polymorphism marker selection on patterns of haplotype blocks and haplotype frequency estimates. , 2005, American journal of human genetics.

[29]  D. Geiger,et al.  Modeling Haplotype Block Variation Using Markov Chains , 2006, Genetics.