A hybrid method of the sequential Monte Carlo and the Edgeworth expansion for computation of very small p-values in permutation tests

Permutation tests are very useful when parametric assumptions are violated or distributions of test statistics are mathematically intractable. The major advantage of permutation tests is that the procedure is so general that it is applicable to most test statistics. The computational expense is, however, impractical in high-dimensional settings such as genomewide association studies. This study provides a comprehensive review of existing methods that can compute very small p-values efficiently. A common issue with existing methods is that they can only be applied to a specific test statistic. To fill in the knowledge gap, we propose a hybrid method of the sequential Monte Carlo and the Edgeworth expansion approximation for a studentized statistic, which is applicable to a variety of test statistics. The simulation results show that the proposed method performs better than competing methods. Furthermore, applications of the proposed method are demonstrated by statistical analysis on the genomewide association studies data from the Study of Addiction: Genetics and Environment (SAGE).

[1]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[2]  A. Janssen,et al.  Studentized permutation tests for non-i.i.d. hypotheses and the generalized Behrens-Fisher problem , 1997 .

[3]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[4]  Peter Hall,et al.  Edgeworth Expansion for Student's $t$ Statistic Under Minimal Moment Conditions , 1987 .

[5]  Nitin R. Patel,et al.  A Network Algorithm for Performing Fisher's Exact Test in r × c Contingency Tables , 1983 .

[6]  G. Dong,et al.  Journey to the east: Diverse routes and variable flowering times for wheat and barley en route to prehistoric China , 2017, PloS one.

[7]  Joseph P. Romano,et al.  EXACT AND ASYMPTOTICALLY ROBUST PERMUTATION TESTS , 2013, 1304.5939.

[8]  Luigi Salmaso,et al.  Permutation Anderson–Darling Type and Moment-Based Test Statistics for Univariate Ordered Categorical Data , 2007, Commun. Stat. Simul. Comput..

[9]  J. Pritchard,et al.  Use of unlinked genetic markers to detect population stratification in association studies. , 1999, American journal of human genetics.

[10]  J. Besag,et al.  Sequential Monte Carlo p-values , 1991 .

[11]  Jia Li,et al.  An efficient genome-wide association test for multivariate phenotypes based on the Fisher combination function , 2016, BMC Bioinformatics.

[12]  Luigi Salmaso,et al.  Finite-sample consistency of combination-based permutation tests with application to repeated measures designs , 2010 .

[13]  Luigi Salmaso,et al.  Permutation Tests for Complex Data , 2010 .

[14]  R. Zucker,et al.  Developmental emergence of alcohol use disorder symptoms and their potential as early indicators for progression to alcohol dependence in a high risk sample: a longitudinal study from childhood to early adulthood. , 2012, Journal of abnormal psychology.

[15]  A. Neath Testing Statistical Hypotheses (3rd ed.). E. L. Lehmann and Joseph P. Romano , 2006 .

[16]  P. Diaconis,et al.  Gray codes for randomization procedures , 1994 .

[17]  J. Hemelrijk,et al.  Some remarks on the combination of independent tests , 1953 .

[18]  Marcello Pagano,et al.  An Algorithm for Finding the Exact Significance Levels of r × c Contingency Tables , 1981 .

[19]  A. Morris,et al.  Data quality control in genetic case-control association studies , 2010, Nature Protocols.

[20]  A. Gelman,et al.  Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box , 2011 .

[21]  G. Ruxton,et al.  Improving the reporting of P‐values generated by randomization methods , 2013 .

[22]  Joseph P. Romano On the behaviour of randomization tests without the group invariance assumption , 1990 .

[23]  Chad C. Brown,et al.  An adaptive permutation approach for genome-wide association study: evaluation and recommendations for use , 2014, BioData Mining.

[24]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[25]  Yi-Hui Zhou,et al.  Hypothesis testing at the extremes: fast and robust association for high-throughput data. , 2014, Biostatistics.

[26]  W. Hoeffding The Large-Sample Power of Tests Based on Permutations of Observations , 1952 .

[27]  W. Patefield Exact Tests for Trends in Ordered Contingency Tables , 1982 .

[28]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[29]  Michael R Elliott,et al.  Fast approximation of small p‐values in permutation tests by partitioning the permutations , 2016, Biometrics.

[30]  L. Williams,et al.  Identifying Pleiotropic Genes in Genome-Wide Association Studies for Multivariate Phenotypes with Mixed Measurement Scales , 2017, PloS one.

[31]  Luigi Salmaso,et al.  Union–intersection permutation solution for two-sample equivalence testing , 2016, Stat. Comput..