Comparison of strategies for selecting single nucleotide polymorphisms for case/control association studies

It is widely believed that a subset of single nucleotide polymorphisms (SNPs) is able to capture the majority of the information for genotype-phenotype association studies that is contained in the complete compliment of genetic variations. The question remains, how does one select that particular subset of SNPs in order to maximize the power of detecting a significant association? In this study, we have used a simulation approach to compare three competing methods of site selection: random selection, selection based on pair-wise linkage disequilibrium, and selection based on maximizing haplotype diversity. The results indicate that site selection based on maximizing haplotype diversity is preferred over random selection and selection based on pair-wise linkage disequilibrium. The results also indicate that it is more prudent to increase the sample size to improve a study's power than to continuously increase the number of SNPs. These results have direct implications for designing gene-based and genome-wide association studies.

[1]  A. von Haeseler,et al.  A coalescent approach to study linkage disequilibrium between single-nucleotide polymorphisms. , 2000, American journal of human genetics.

[2]  S. Liu-Cordero Patterns of linkage disequilibrium in the human genome , 2002 .

[3]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[4]  Francis S. Collins,et al.  Variations on a Theme: Cataloging Human DNA Sequence Variation , 1997, Science.

[5]  Pardis C Sabeti,et al.  Linkage disequilibrium in the human genome , 2001, Nature.

[6]  R. Doerge,et al.  Empirical threshold values for quantitative trait mapping. , 1994, Genetics.

[7]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[8]  T. Petes,et al.  Meiotic recombination hot spots and cold spots , 2001, Nature Reviews Genetics.

[9]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[10]  A. Long,et al.  The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. , 1999, Genome research.

[11]  C. Burge,et al.  Assessment of the total number of human transcription units. , 2001, Genomics.

[12]  Frank Dudbridge,et al.  Haplotype tagging for the identification of common disease genes , 2001, Nature Genetics.

[13]  A. Jeffreys,et al.  Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex , 2001, Nature Genetics.

[14]  N. Schork,et al.  Genetic analysis of case/control data using estimated haplotype frequencies: application to APOE locus variation and Alzheimer's disease. , 2001, Genome research.

[15]  E. Boerwinkle,et al.  Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. , 1998, American journal of human genetics.

[16]  L. Kruglyak Prospects for whole-genome linkage disequilibrium mapping of common disease genes , 1999, Nature Genetics.

[17]  P. Sham,et al.  Model-Free Analysis and Permutation Tests for Allelic Associations , 1999, Human Heredity.

[18]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[19]  M. Suchard,et al.  SNPing Away at Candidate Genes , 2001, Genetic epidemiology.