The power of genome-wide association studies of complex disease genes: statistical limitations of indirect approaches using SNP markers

AbstractGenome-wide association studies using a dense map of single nucleotide polymorphism (SNP) markers seem to enable us to detect a number of complex disease genes. In such indirect association studies, whether susceptibility genes can be detected is dependent not only on the degree of linkage disequilibrium between the disease variant and the SNP marker but also on the difference in their allele frequencies. These factors, as well as penetrance of the disease variant, influence the statistical power of such approaches. However, the power of indirect association studies is not well understood. We calculated the number of individuals necessary for the detection of the disease variant in both direct and indirect association studies with a case-control design. The result shows that a remarkable reduction in the statistical power of indirect studies, compared with that of direct ones, is unavoidable in the genome-wide screening of complex disease genes. If there is a large difference in allele frequency between the disease variant and the marker, the disease variant cannot be detected. Because the frequency of the disease variant is unknown, SNP markers with various allele frequencies, or a large number of SNP markers, must be used in indirect association studies. However, if the number of SNP markers is increased, the obtained P value may not reach the significance level due to the Bonferroni adjustment. Thus, to test a possible association between functional variants and a complex disease directly, we should identify such SNPs in as many genes as possible for use in genome-wide association studies.

[1]  B. Müller-Myhsok,et al.  Maximum-likelihood expression of the transmission/disequilibrium test and power considerations. , 1998, American journal of human genetics.

[2]  K. Tokunaga,et al.  Comparison of statistical power between 2×2 allele frequency and allele positivity tables in case‐control studies of complex disease genes , 2001 .

[3]  Francis S. Collins,et al.  Variations on a Theme: Cataloging Human DNA Sequence Variation , 1997, Science.

[4]  W. Ewens,et al.  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). , 1993, American journal of human genetics.

[5]  N E Morton,et al.  Tests and estimates of allelic association in complex inheritance. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  B Müller-Myhsok,et al.  Genetic analysis of complex diseases. , 1997, Science.

[7]  L R Cardon,et al.  Extent and distribution of linkage disequilibrium in three genomic regions. , 2001, American journal of human genetics.

[8]  J. Witte,et al.  Linkage disequilibrium and allele-frequency distributions for 114 single-nucleotide polymorphisms in five populations. , 2000, American journal of human genetics.

[9]  K. Tokunaga,et al.  Comparison of statistical power between 2 * 2 allele frequency and allele positivity tables in case-control studies of complex disease genes. , 2001, Annals of human genetics.

[10]  L. Kruglyak Prospects for whole-genome linkage disequilibrium mapping of common disease genes , 1999, Nature Genetics.

[11]  N E Morton,et al.  Genetic epidemiology of single-nucleotide polymorphisms. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[12]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.