Analytic P-value calculation for the higher criticism test in finite d problems.

The higher criticism is effective for testing a joint null hypothesis against a sparse alternative, e.g., for testing the effect of a gene or a genetic pathway that consists of d genetic markers. Accurate p-value calculations for the higher criticism based on the asymptotic distribution require a very large d, which is not the case for the number of genetic variants in a gene or a pathway. In this paper we propose an analytic method that accurately computes the p-value of the higher criticism test for finite d problems. Unlike previous treatments, this method does not rely on asymptotics in d or simulation, and is exact for arbitrary d when the test statistics are normally distributed. The method is particularly computationally advantageous when d is not large. We illustrate the proposed method with a case-control genome-wide association study of lung cancer and compare its power to competing methods through simulations.

[1]  K. Strimmer,et al.  Feature selection in omics prediction problems using cat scores and false nondiscovery rate control , 2009, 0903.2003.

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  E. Candès,et al.  Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism , 2010, 1007.1434.

[4]  R. Houlston,et al.  The TERT-CLPTM1L lung cancer susceptibility variant associates with higher DNA adduct formation in the lung. , 2009, Carcinogenesis.

[5]  Christopher I Amos,et al.  The CHRNA5-A3 region on chromosome 15q24-25.1 is a risk factor both for nicotine dependence and for lung cancer. , 2008, Journal of the National Cancer Institute.

[6]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[7]  P. Hall,et al.  Innovated Higher Criticism for Detecting Sparse Signals in Correlated Noise , 2009, 0902.3837.

[8]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[9]  D. Donoho,et al.  Higher criticism thresholding: Optimal feature selection when useful features are rare and weak , 2008, Proceedings of the National Academy of Sciences.

[10]  D. Jaeschke The Asymptotic Distribution of the Supremum of the Standardized Empirical Distribution Function on Subintervals , 1979 .

[11]  Evarist Giné,et al.  Empirical Processes , 2011, International Encyclopedia of Statistical Science.

[12]  Jung-Ying Tzeng,et al.  Haplotype-based association analysis via variance-components score test. , 2007, American journal of human genetics.

[13]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[14]  Yu-Sun Chang,et al.  Quantitative Proteomics Reveals Regulation of Karyopherin Subunit Alpha-2 (KPNA2) and Its Potential Novel Cargo Proteins in Nonsmall Cell Lung Cancer * , 2012, Molecular & Cellular Proteomics.