Efficient p-value evaluation for resampling-based tests.

The resampling-based test, which often relies on permutation or bootstrap procedures, has been widely used for statistical hypothesis testing when the asymptotic distribution of the test statistic is unavailable or unreliable. It requires repeated calculations of the test statistic on a large number of simulated data sets for its significance level assessment, and thus it could become very computationally intensive. Here, we propose an efficient p-value evaluation procedure by adapting the stochastic approximation Markov chain Monte Carlo algorithm. The new procedure can be used easily for estimating the p-value for any resampling-based test. We show through numeric simulations that the proposed procedure can be 100-500 000 times as efficient (in term of computing time) as the standard resampling-based procedure when evaluating a test statistic with a small p-value (e.g. less than 10( - 6)). With its computational burden reduced by this proposed procedure, the versatile resampling-based test would become computationally feasible for a much wider range of applications. We demonstrate the application of the new method by applying it to a large-scale genetic association study of prostate cancer.

[1]  Ying Wang,et al.  Genomewide association study of leprosy. , 2009, The New England journal of medicine.

[2]  David Siegmund,et al.  Approximate Tail Probabilities for the Maxima of Some Random Fields , 1988 .

[3]  Carey E Priebe,et al.  Computing Scan Statistic p Values Using Importance Sampling, With Applications to Genetics and Medical Image Analysis , 2001 .

[4]  H. Ola,et al.  Using importance sampling to improve simulation in linkage analysis. , 2004 .

[5]  W. Willett,et al.  Multiple loci identified in a genome-wide association study of prostate cancer , 2008, Nature Genetics.

[6]  P. Rosenberg,et al.  Pathway analysis by adaptive combination of P‐values , 2009, Genetic epidemiology.

[7]  M. Boehnke,et al.  So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. , 2007, American journal of human genetics.

[8]  J. Booth,et al.  Resampling-Based Multiple Testing. , 1994 .

[9]  F. Liang Trajectory averaging for stochastic approximation MCMC algorithms , 2010, 1011.2587.

[10]  P. Good Permutation, Parametric, and Bootstrap Tests of Hypotheses , 2005 .

[11]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[12]  R. Carroll,et al.  Stochastic Approximation in Monte Carlo Computation , 2007 .

[13]  N. Chatterjee,et al.  Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. , 2006, American journal of human genetics.

[14]  E Feingold,et al.  Gaussian models for genetic linkage analysis using complete high-resolution maps of identity by descent. , 1993, American journal of human genetics.

[15]  Qizhai Li,et al.  Efficient Approximation of P‐value of the Maximum of Correlated Tests, with Applications to Genome‐Wide Association Studies , 2008, Annals of human genetics.

[16]  R. Shamir,et al.  A fast method for computing high-significance disease association in large population-based studies. , 2006, American journal of human genetics.

[17]  Jianxin Shi,et al.  Importance Sampling for Estimating p Values in Linkage Analysis , 2007 .

[18]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[19]  H. Robbins A Stochastic Approximation Method , 1951 .

[20]  P. Fearnhead,et al.  Genome-wide association study of prostate cancer identifies a second risk locus at 8q24 , 2007, Nature Genetics.