Estimation of significance thresholds for genomewide association scans

The question of what significance threshold is appropriate for genomewide association studies is somewhat unresolved. Previous theoretical suggestions have yet to be validated in practice, whereas permutation testing does not resolve a discrepancy between the genomewide multiplicity of the experiment and the subset of markers actually tested. We used genotypes from the Wellcome Trust Case‐Control Consortium to estimate a genomewide significance threshold for the UK Caucasian population. We subsampled the genotypes at increasing densities, using permutation to estimate the nominal P‐value for 5% family‐wise error. By extrapolating to infinite density, we estimated the genomewide significance threshold to be about 7.2 × 10−8. To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction. However, by fitting a Beta distribution to the minimum P‐value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem. We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population. Genet. Epidemiol. 2008. © 2008 Wiley‐Liss, Inc.

[1]  N. Morton Sequential tests for the detection of linkage. , 1955, American journal of human genetics.

[2]  Z. Šidák Rectangular Confidence Regions for the Means of Multivariate Normal Distributions , 1967 .

[3]  A. Tamhane,et al.  Multiple Comparison Procedures , 1989 .

[4]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[5]  R. Doerge,et al.  Empirical threshold values for quantitative trait mapping. , 1994, Genetics.

[6]  E. Lander,et al.  Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results , 1995, Nature Genetics.

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[9]  R. Gürtler,et al.  Modeling Household Transmission of American Trypanosomiasis , 2001, Science.

[10]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[11]  Frank Dudbridge,et al.  Rank truncated product of P‐values, with application to genomewide association scans , 2003, Genetic epidemiology.

[12]  K. Manly,et al.  Genomics, prior probability, and statistical tests of multiple hypotheses. , 2004, Genome research.

[13]  Nathaniel Rothman,et al.  Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. , 2004, Journal of the National Cancer Institute.

[14]  D. Clayton,et al.  Betting odds and genetic associations. , 2004, Journal of the National Cancer Institute.

[15]  Frank Dudbridge,et al.  Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. , 2004, American journal of human genetics.

[16]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[17]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[18]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[19]  H. Keselman,et al.  Multiple Comparison Procedures , 2005 .

[20]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[21]  M. Daly,et al.  Evaluating and improving power in whole-genome association studies using fixed marker sets , 2006, Nature Genetics.

[22]  Lon R Cardon,et al.  Evaluating coverage of genome-wide association studies , 2006, Nature Genetics.

[23]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[24]  Jon Wakefield,et al.  A Bayesian measure of the probability of false discovery in genetic epidemiology studies. , 2007, American journal of human genetics.

[25]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[26]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .