High-speed westfall-young permutation procedure for genome-wide association studies

Genome-wide association studies (GWASs) are widely used to investigate statistically significant associations between diseases and single nucleotide polymorphisms (SNPs) to identify causal factors of diseases. In GWAS, statistical significance of more than one million SNPs have been recently assessed, but in many case, no associations are found because of the application of conservative multiple testing corrections, such as Bonferroni correction. While more sensitive methods, such as Westfall-Young permutation procedure (WY), would relate more SNPs with diseases, its extremely long computational time has prohibited from the application of WY to GWAS. We introduce an algorithm to accelerate WY, named High-speed Westfall-Young permutation procedure (HWY). HWY utilizes three techniques to make WY computationally practical. First, P-value calculations for SNPs that cannot affect the adjusted significance level are pruned. Second, a lookup table of P-values is used to avoid frequent duplicate calculations. Finally, computations are parallelized using a GPGPU. HWY was 619 times faster than WY and more than 122 times faster than PLINK, a widely used GWAS software, and analyzed a dataset contained one million SNPs and one thousand individuals in approximately two hours. Re-analysis of existing GWAS datasets with HWY may uncover additional hidden SNP-trait associations.

[1]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[2]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[3]  D. Stephan,et al.  Genetic control of human brain transcript expression in Alzheimer disease. , 2009, American journal of human genetics.

[4]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[5]  Yongchao Ge Resampling-based Multiple Testing for Microarray Data Analysis , 2003 .

[6]  Xuehui Huang,et al.  Natural variations and genome-wide association studies in crop plants. , 2014, Annual review of plant biology.

[7]  Qiang Yang,et al.  PBOOST: a GPU-based tool for parallel permutation tests in genome-wide association studies , 2015, Bioinform..

[8]  K. Tsuda,et al.  Statistical significance of combinatorial regulations , 2013, Proceedings of the National Academy of Sciences.

[9]  Joshua T. Burdick,et al.  Mapping determinants of human gene expression by regional and genome-wide association , 2005, Nature.

[10]  Markus Neuhäuser,et al.  Good practice in testing for an association in contingency tables , 2010, Behavioral Ecology and Sociobiology.

[11]  Bjarni J. Vilhjálmsson,et al.  Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines , 2010 .

[12]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[13]  Nicolai Meinshausen,et al.  Asymptotic optimality of the Westfall--Young permutation procedure for multiple testing under dependence , 2011, 1106.2068.

[14]  Z. Šidák Rectangular Confidence Regions for the Means of Multivariate Normal Distributions , 1967 .

[15]  Koji Tsuda,et al.  Fast Westfall-Young permutation procedure for combinatorial regulation discovery , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[16]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[17]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[18]  Christian Gieger,et al.  Genome-wide association study of restless legs syndrome identifies common variants in three genomic regions , 2007, Nature Genetics.

[19]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[20]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[21]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.