Haplotype block structure and its applications to association studies: power and study designs.

Recent studies have shown that the human genome has a haplotype block structure, such that it can be divided into discrete blocks of limited haplotype diversity. In each block, a small fraction of single-nucleotide polymorphisms (SNPs), referred to as "tag SNPs," can be used to distinguish a large fraction of the haplotypes. These tag SNPs can potentially be extremely useful for association studies, in that it may not be necessary to genotype all SNPs; however, this depends on how much power is lost. Here we develop a simulation study to quantitatively assess the power loss for a variety of study designs, including case-control designs and case-parental control designs. First, a number of data sets containing case-parental or case-control samples are generated on the basis of a disease model. Second, a small fraction of case and control individuals in each data set are genotyped at all the loci, and a dynamic programming algorithm is used to determine the haplotype blocks and the tag SNPs based on the genotypes of the sampled individuals. Third, the statistical power of tests was evaluated on the basis of three kinds of data: (1) all of the SNPs and the corresponding haplotypes, (2) the tag SNPs and the corresponding haplotypes, and (3) the same number of randomly chosen SNPs as the number of tag SNPs and the corresponding haplotypes. We study the power of different association tests with a variety of disease models and block-partitioning criteria. Our study indicates that the genotyping efforts can be significantly reduced by the tag SNPs, without much loss of power. Depending on the specific haplotype block-partitioning algorithm and the disease model, when the identified tag SNPs are only 25% of all the SNPs, the power is reduced by only 4%, on average, compared with a power loss of approximately 12% when the same number of randomly chosen SNPs is used in a two-locus haplotype analysis. When the identified tag SNPs are approximately 14% of all the SNPs, the power is reduced by approximately 9%, compared with a power loss of approximately 21% when the same number of randomly chosen SNPs is used in a two-locus haplotype analysis. Our study also indicates that haplotype-based analysis can be much more powerful than marker-by-marker analysis.

[1]  R. Hudson,et al.  The use of sample genealogies for studying a selectively neutral m-loci model with recombination. , 1985, Theoretical population biology.

[2]  D. Curtis,et al.  An extended transmission/disequilibrium test (TDT) for multi‐allele marker loci , 1995, Annals of human genetics.

[3]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[4]  D. Clayton,et al.  Unbiased application of the transmission/disequilibrium test to multilocus haplotypes. , 2000, American journal of human genetics.

[5]  Simon Tavaré,et al.  Linkage disequilibrium: what history has to tell us. , 2002, Trends in genetics : TIG.

[6]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[7]  W. Ewens,et al.  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). , 1993, American journal of human genetics.

[8]  K. Kidd,et al.  HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. , 1995, The Journal of heredity.

[9]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[10]  E Lai,et al.  The extent of linkage disequilibrium in four populations with distinct demographic histories. , 2000, American journal of human genetics.

[11]  M. Boehnke,et al.  Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies , 2001, Nature Genetics.

[12]  Frank Dudbridge,et al.  Haplotype tagging for the identification of common disease genes , 2001, Nature Genetics.

[13]  R. Griffiths,et al.  An ancestral recombination graph , 1997 .

[14]  K B Jacobs,et al.  Exact transmission‐disequilibrium tests with multiallelic markers , 1997, Genetic epidemiology.

[15]  L M McIntyre,et al.  Circumventing multiple testing: A multilocus Monte Carlo approach to testing for association , 2000, Genetic epidemiology.

[16]  M. Waterman,et al.  A dynamic programming algorithm for haplotype block partitioning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  W J Ewens,et al.  The TDT and other family-based tests for linkage disequilibrium and association. , 1996, American journal of human genetics.

[18]  D. Clayton,et al.  Transmission/disequilibrium tests for extended marker haplotypes. , 1999, American journal of human genetics.

[19]  P. Donnelly,et al.  Progress in population genetics and human evolution , 1997 .

[20]  E M Wijsman,et al.  Design and sample-size considerations in the detection of linkage disequilibrium with a disease locus. , 1994, American journal of human genetics.

[21]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[22]  Pui-Yan Kwok,et al.  Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28 , 2000, Nature Genetics.

[23]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[24]  I. Eisenbarth,et al.  Long-range sequence composition mirrors linkage disequilibrium pattern in a 1.13 Mb region of human chromosome 22. , 2001, Human molecular genetics.

[25]  J. Pritchard,et al.  Use of unlinked genetic markers to detect population stratification in association studies. , 1999, American journal of human genetics.

[26]  S. Tishkoff,et al.  Molecular haplotyping of genetic markers 10 kb apart by allele-specific long-range PCR. , 1996, Nucleic acids research.

[27]  L. Kruglyak Prospects for whole-genome linkage disequilibrium mapping of common disease genes , 1999, Nature Genetics.

[28]  C. Nusbaum,et al.  Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. , 1998, Science.

[29]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[30]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[31]  J. Long,et al.  An E-M algorithm and testing strategy for multiple-locus haplotypes. , 1995, American journal of human genetics.

[32]  K. Kidd,et al.  Transmission/disequilibrium tests using multiple tightly linked markers. , 2000, American journal of human genetics.

[33]  Lon R. Cardon,et al.  A first-generation linkage disequilibrium map of human chromosome 22 , 2002, Nature.

[34]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[35]  Pardis C Sabeti,et al.  Linkage disequilibrium in the human genome , 2001, Nature.

[36]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.