Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets

Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by advanced statistical genotype-imputation algorithms and large SNP databases for reference human populations. The testing of a huge number of SNPs needs to be taken into account in the interpretation of statistical significance in such genome-wide studies, but this is complicated by the non-independence of SNPs because of linkage disequilibrium (LD). Several previous groups have proposed the use of the effective number of independent markers (Me) for the adjustment of multiple testing, but current methods of calculation for Me are limited in accuracy or computational speed. Here, we report a more robust and fast method to calculate Me. Applying this efficient method [implemented in a free software tool named Genetic type 1 error calculator (GEC)], we systematically examined the Me, and the corresponding p-value thresholds required to control the genome-wide type 1 error rate at 0.05, for 13 Illumina or Affymetrix genotyping arrays, as well as for HapMap Project and 1000 Genomes Project datasets which are widely used in genotype imputation as reference panels. Our results suggested the use of a p-value threshold of ~10−7 as the criterion for genome-wide significance for early commercial genotyping arrays, but slightly more stringent p-value thresholds ~5 × 10−8 for current or merged commercial genotyping arrays, ~10−8 for all common SNPs in the 1000 Genomes Project dataset and ~5 × 10−8 for the common SNPs only within genes.

[1]  Lon R Cardon,et al.  Evaluating coverage of genome-wide association studies , 2006, Nature Genetics.

[2]  N. Galwey,et al.  A new measure of the effective number of tests, a practical tool for comparing families of non‐independent significance tests , 2009, Genetic epidemiology.

[3]  Pardis C Sabeti,et al.  Linkage disequilibrium in the human genome , 2001, Nature.

[4]  D. Nyholt A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. , 2004, American journal of human genetics.

[5]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[6]  M. Daly,et al.  Estimation of the multiple testing burden for genomewide association studies of nearly all common variants , 2008, Genetic epidemiology.

[7]  J. Li,et al.  Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix , 2005, Heredity.

[8]  Bjarni V. Halldórsson,et al.  Association of JAG1 with bone mineral density and osteoporotic fractures: a genome-wide association study and follow-up replication studies. , 2010, American journal of human genetics.

[9]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[10]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[11]  J. Witte,et al.  Genetic dissection of complex traits , 1996, Nature Genetics.

[12]  Helmut Schäfer,et al.  PERMORY: an LD-exploiting permutation test algorithm for powerful genome-wide association testing , 2010, Bioinform..

[13]  W. G. Hill,et al.  Linkage disequilibrium in finite populations , 1968, Theoretical and Applied Genetics.

[14]  J. Ragoussis Genotyping technologies for genetic research. , 2009, Annual review of genomics and human genetics.

[15]  E. Lander,et al.  Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results , 1995, Nature Genetics.

[16]  B Müller-Myhsok,et al.  Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. , 2005, American journal of human genetics.

[17]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[18]  M. Boehnke,et al.  So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. , 2007, American journal of human genetics.

[19]  Eleazar Eskin,et al.  Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers , 2009, PLoS genetics.

[20]  J. Cheverud,et al.  A simple correction for multiple comparisons in interval mapping genome scans , 2001, Heredity.

[21]  V. Moskvina,et al.  On multiple‐testing correction in genome‐wide association studies , 2008, Genetic epidemiology.

[22]  Eden R Martin,et al.  A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms , 2008, Genetic epidemiology.

[23]  Eric E. Schadt,et al.  Calibrating the Performance of SNP Arrays for Whole-Genome Association Studies , 2008, PLoS genetics.

[24]  M. Daly,et al.  Evaluating and improving power in whole-genome association studies using fixed marker sets , 2006, Nature Genetics.

[25]  F. Dudbridge,et al.  Estimation of significance thresholds for genomewide association scans , 2008, Genetic epidemiology.

[26]  Joan E Bailey-Wilson,et al.  Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies , 2008, BMC Genomics.

[27]  S. Tishkoff,et al.  Global Patterns of Linkage Disequilibrium at the CD4 Locus and Modern Human Origins , 1996, Science.

[28]  Frank Dudbridge,et al.  Evaluation of Nyholt’s Procedure for Multiple Testing Correction , 2005, Human Heredity.

[29]  Giovanni Montana,et al.  HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients , 2005, Bioinform..

[30]  D. Y. Lin,et al.  An efficient Monte Carlo approach to assessing statistical significance in genomic studies , 2005, Bioinform..

[31]  A. Morris,et al.  Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. , 2008, American journal of human genetics.

[32]  J. Witte,et al.  Genetic dissection of complex traits. , 1994, Nature genetics.

[33]  Johnny S. H. Kwan,et al.  GATES: a rapid and powerful gene-based association test using extended Simes procedure. , 2011, American journal of human genetics.