Evolutionary‐based grouping of haplotypes in association analysis

Haplotypes incorporate more information about the underlying polymorphisms than do genotypes for individual SNPs, and are considered as a more informative format of data in association analysis. To model haplotypes requires high degrees of freedom, which could decrease power and limit a model's capacity to incorporate other complex effects, such as gene‐gene interactions. Even within haplotype blocks, high degrees of freedom are still a concern unless one chooses to discard rare haplotypes. To increase the efficiency and power of haplotype analysis, we adapt the evolutionary concepts of cladistic analyses and propose a grouping algorithm to cluster rare haplotypes to the corresponding ancestral haplotypes. The algorithm determines the cluster bases by preserving common haplotypes using a criterion built on the Shannon information content. Each haplotype is then assigned to its appropriate clusters probabilistically according to the cladistic relationship. Through this algorithm, we perform association analysis based on groups of haplotypes. Simulation results indicate power increases for performing tests on the haplotype clusters when compared to tests using original haplotypes or the truncated haplotype distribution. Genet. Epidemiol. © 2005 Wiley‐Liss, Inc.

[1]  P. Greenwood,et al.  A Guide to Chi-Squared Testing , 1996 .

[2]  M. Slatkin,et al.  Estimating the age of alleles by use of intraallelic variability. , 1997, American journal of human genetics.

[3]  A. Templeton,et al.  A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping or DNA sequencing. V. Analysis of case/control sampling designs: Alzheimer's disease and the apoprotein E locus. , 1995, Genetics.

[4]  K. Crandall,et al.  Empirical tests of some predictions from coalescent theory with applications to intraspecific phylogeny reconstruction. , 1993, Genetics.

[5]  P. Marjoram,et al.  Fine-scale mapping of disease genes with multiple mutations via spatial clustering techniques. , 2003, American journal of human genetics.

[6]  E. Génin,et al.  Use of closely related affected individuals for the genetic study of complex diseases in founder populations. , 2001, American journal of human genetics.

[7]  Jianping Dong,et al.  Transmission/disequilibrium test based on haplotype sharing for tightly linked markers. , 2003, American journal of human genetics.

[8]  Andrew P Morris,et al.  Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. , 2004, American journal of human genetics.

[9]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[10]  T. Meerman,et al.  Haplotype sharing analysis in affected individuals from nuclear families with at least one affected offspring , 1997 .

[11]  A. Templeton,et al.  Root probabilities for intraspecific gene trees under neutral coalescent theory. , 1994, Molecular phylogenetics and evolution.

[12]  Hongyu Zhao,et al.  Haplotype analysis in population genetics and association studies. , 2003, Pharmacogenomics.

[13]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[14]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[15]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[16]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[17]  Frank Dudbridge,et al.  Haplotype tagging for the identification of common disease genes , 2001, Nature Genetics.

[18]  C Charles Gu,et al.  Genetic association mapping under founder heterogeneity via weighted haplotype similarity analysis in candidate genes , 2004, Genetic epidemiology.

[19]  Missing data in haplotype analysis: a study on the MILC method. , 2002 .

[20]  E. Boerwinkle,et al.  A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. I. Basic theory and an analysis of alcohol dehydrogenase activity in Drosophila. , 1987, Genetics.

[21]  K. Roeder,et al.  Evolutionary‐based association analysis using haplotype data , 2003 .

[22]  Larry Wasserman,et al.  Outlier Detection and False Discovery Rates for Whole-Genome DNA Matching , 2003 .

[23]  E. Génin,et al.  Search for multifactorial disease susceptibility genes in founder populations , 2000, Annals of human genetics.

[24]  K. Zhang,et al.  The Power of Transmission Disequilibrium Tests for Quantitative Traits , 2001, Genetic epidemiology.

[25]  L. Wasserman,et al.  On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit. , 2003, American journal of human genetics.

[26]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[27]  Alessandro Rinaldo,et al.  Characterization of multilocus linkage disequilibrium , 2005, Genetic epidemiology.

[28]  K. Roeder,et al.  Transmission/disequilibrium test meets measured haplotype analysis: family-based association analysis guided by evolution of haplotypes. , 2001, American journal of human genetics.

[29]  J. Wall,et al.  Assessing the performance of the haplotype block model of linkage disequilibrium. , 2003, American journal of human genetics.

[30]  Zhaohui S. Qin,et al.  Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[31]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[32]  Jianqing Fan Test of Significance Based on Wavelet Thresholding and Neyman's Truncation , 1996 .

[33]  Mourad Sahbatou,et al.  Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease , 2001, Nature.