Genome‐wide significance for dense SNP and resequencing data

The problem of multiple testing is an important aspect of genome‐wide association studies, and will become more important as marker densities increase. The problem has been tackled with permutation and false discovery rate procedures and with Bayes factors, but each approach faces difficulties that we briefly review. In the current context of multiple studies on different genotyping platforms, we argue for the use of truly genome‐wide significance thresholds, based on all polymorphisms whether or not typed in the study. We approximate genome‐wide significance thresholds in contemporary West African, East Asian and European populations by simulating sequence data, based on all polymorphisms as well as for a range of single nucleotide polymorphism (SNP) selection criteria. Overall we find that significance thresholds vary by a factor of >20 over the SNP selection criteria and statistical tests that we consider and can be highly dependent on sample size. We compare our results for sequence data to those derived by the HapMap Consortium and find notable differences which may be due to the small sample sizes used in the HapMap estimate. Genet. Epidemiol. 32:179–185, 2008. © 2007 Wiley‐Liss, Inc.

[1]  Z. Šidák ON MULTIVARIATE NORMAL PROBABILITIES OF RECTANGLES: THEIR DEPENDENCE ON CORRELATIONS' , 1968 .

[2]  Z. Šidák On Probabilities of Rectangles in Multivariate Student Distributions: Their Dependence on Correlations , 1971 .

[3]  E. Lander,et al.  Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results , 1995, Nature Genetics.

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[6]  J. Cheverud,et al.  A simple correction for multiple comparisons in interval mapping genome scans , 2001, Heredity.

[7]  Joseph L. Gastwirth,et al.  Trend Tests for Case-Control Studies of Genetic Markers: Power, Sample Size and Robustness , 2002, Human Heredity.

[8]  D. Gudbjartsson,et al.  A high-resolution recombination map of the human genome , 2002, Nature Genetics.

[9]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[11]  D. Nyholt A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. , 2004, American journal of human genetics.

[12]  Frank Dudbridge,et al.  Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. , 2004, American journal of human genetics.

[13]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[14]  M. McCarthy,et al.  An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets , 2005, Nature Genetics.

[15]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[16]  Qiong Yang,et al.  Power and type I error rate of false discovery rate approaches in genome-wide association studies , 2005, BMC Genetics.

[17]  B Müller-Myhsok,et al.  Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. , 2005, American journal of human genetics.

[18]  Frank Dudbridge,et al.  Evaluation of Nyholt’s Procedure for Multiple Testing Correction , 2005, Human Heredity.

[19]  Frank Dudbridge,et al.  Detecting multiple associations in genome-wide studies , 2006, Human Genomics.

[20]  R. Shamir,et al.  A fast method for computing high-significance disease association in large population-based studies. , 2006, American journal of human genetics.

[21]  C. Hoggart,et al.  Sequence-Level Population Simulations Over Large Genomic Regions , 2007, Genetics.

[22]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.