A multilocus linkage disequilibrium measure based on mutual information theory and its applications

Evaluating the patterns of linkage disequilibrium (LD) is important for association mapping study as well as for studying the genomic architecture of human genome (e.g., haplotype block structures). Commonly used bi-allelic pairwise measures for assessing LD between two loci, such as r2 and D′, may not make full and efficient use of modern multilocus data. Though extended to multilocus scenarios, their performance is still questionable. Meanwhile, most existing measures for an entire multilocus region, such as normalized entropy difference, do not consider existence of LD heterogeneity across the region under investigation. Additionally, these existing multilocus measures cannot handle distant regions where long-range LD patterns may exist. In this study, we proposed a novel multilocus LD measure developed based on mutual information theory. Our proposed measure described LD pattern between two chromosome regions each of which may consist of multiple loci (including multi-allele loci). As such, the proposed measure can better characterize LD patterns between two arbitrary regions. As potential applications, we developed algorithms on the proposed measure for partitioning haplotype blocks and for selecting haplotype tagging SNPs (htSNPs), which were helpful for follow-up association tests. The results on both simulated and empirical data showed that our LD measure had distinct advantages over pairwise and other multilocus measures. First, our measure was more robust, and can capture comprehensively the LD information between neighboring as well as disjointed regions. Second, haplotype blocks were better described via our proposed measure. Furthermore, association tests with htSNPs from the proposed algorithm had improved power over tests on single markers and on haplotypes.

[1]  R. Lewontin The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models. , 1964, Genetics.

[2]  P. Hedrick,et al.  Gametic disequilibrium measures: proceed with caution. , 1987, Genetics.

[3]  B S Weir,et al.  Maximum-likelihood estimation of gene location by linkage disequilibrium. , 1994, American journal of human genetics.

[4]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[5]  J. Kidd,et al.  Assessing linkage disequilibrium in a complex genetic system. I. Overall deviation from random association , 1999, Annals of human genetics.

[6]  A. Jeffreys,et al.  High resolution analysis of haplotype diversity and meiotic crossover in the human TAP2 recombination hotspot. , 2000, Human molecular genetics.

[7]  L. Jorde,et al.  Linkage disequilibrium and the search for complex disease genes. , 2000, Genome research.

[8]  D. Balding,et al.  Measuring gametic disequilibrium from multilocus data. , 2001, Genetics.

[9]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[10]  Pardis C Sabeti,et al.  Linkage disequilibrium in the human genome , 2001, Nature.

[11]  M. Waterman,et al.  A dynamic programming algorithm for haplotype block partitioning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Fengzhu Sun,et al.  Haplotype block structure and its applications to association studies: power and study designs. , 2002, American journal of human genetics.

[13]  J. Akey,et al.  Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. , 2002, American journal of human genetics.

[14]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[15]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[16]  S. Liu-Cordero Patterns of linkage disequilibrium in the human genome , 2002 .

[17]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[18]  K. Rohde,et al.  Entropy as a Measure for Linkage Disequilibrium over Multilocus Haplotype Blocks , 2003, Human Heredity.

[19]  Benjamin Yakir,et al.  Linkage disequilibrium patterns of the human genome across populations. , 2003, Human molecular genetics.

[20]  J. Wall,et al.  Assessing the performance of the haplotype block model of linkage disequilibrium. , 2003, American journal of human genetics.

[21]  C. Dong,et al.  A quantitative trait locus influencing fasting plasma glucose in chromosome region 18q22-23. , 2004, Diabetes.

[22]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[23]  Shili Lin,et al.  Multilocus LD measure and tagging SNP selection with generalized mutual information , 2005, Genetic epidemiology.

[24]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[25]  C. Schmegner,et al.  Genetic variability in a genomic region with long-range linkage disequilibrium reveals traces of a bottleneck in the history of the European population , 2005, Human Genetics.

[26]  Ting Chen,et al.  Inference of missing SNPs and information quantity measurements for haplotype blocks , 2005, Bioinform..

[27]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[28]  Michael Nothnagel,et al.  The effect of single-nucleotide polymorphism marker selection on patterns of haplotype blocks and haplotype frequency estimates. , 2005, American journal of human genetics.

[29]  Alessandro Rinaldo,et al.  Characterization of multilocus linkage disequilibrium , 2005, Genetic epidemiology.

[30]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[31]  Edwin Cuppen,et al.  Haplotype Block Structure Is Conserved across Mammals , 2006, PLoS genetics.

[32]  Sandrine Dudoit,et al.  A fine-scale linkage-disequilibrium measure based on length of haplotype sharing. , 2006, American journal of human genetics.

[33]  Dan L Nicolae,et al.  Quantifying the amount of missing information in genetic association studies , 2006, Genetic epidemiology.

[34]  Mary Sara McPeek,et al.  Multipoint linkage-disequilibrium mapping with haplotype-block structure. , 2007, American journal of human genetics.

[35]  Wing-Kin Sung,et al.  Association mapping via regularized regression analysis of single-nucleotide-polymorphism haplotypes in variable-sized sliding windows. , 2007, American journal of human genetics.