Higher Criticism in the Context of Unknown Distribution, Non-independence and Classification

Higher criticism has been proposed as a tool for highly multiple hypothesis testing or signal detection, initially in cases where the distribution of a test statistic (or the noise in a signal) is known and the component tests are statistically independent. In this paper we explore the extent to which the assumptions of known distribution and independence can be relaxed, and we consider too the application of higher criticism to classification. It is shown that effective distribution approximations can be achieved by using a threshold approach; that is, by disregarding data components unless their significance level exceeds a sufficiently high value. This method exploits the good relative accuracy of approximations to lighttailed distributions. In particular, it can be effective when the true distribution is founded on something like a Studentised mean, or on an average of related type, which is commonly the case in practice. The issue of dependence among vector components is also shown not to be a serious difficulty in many circumstances.

[1]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[3]  Jianqing Fan,et al.  Semilinear High-Dimensional Model for Normalization of Microarray Data , 2005 .

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[6]  Jianqing Fan,et al.  Removing intensity effects and identifying significant genes for Affymetrix arrays in macrophage migration inhibitory factor-suppressed neuroblastoma cells. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[8]  P. Vielva,et al.  The Non-Gaussian Cold Spot in the 3 Year Wilkinson Microwave Anisotropy Probe Data , 2007 .

[9]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[10]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[11]  L. Cayon,et al.  Higher Criticism statistic: detecting and identifying non-Gaussianity in the WMAP first-year data , 2005 .

[12]  N. Meinshausen,et al.  Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses , 2005, math/0501289.

[13]  Institute of Theoretical Astrophysics,et al.  No Higher Criticism of the Bianchi Corrected WMAP Data , 2006, astro-ph/0602023.

[14]  P. Hall,et al.  RELATIVE ERRORS IN CENTRAL LIMIT THEOREMS FOR STUDENT'S t STATISTIC, WITH APPLICATIONS , 2009 .

[15]  I. Johnstone,et al.  Adapting to unknown sparsity by controlling the false discovery rate , 2005, math/0505374.

[16]  Jean-Luc Starck,et al.  Cosmological Non-Gaussian Signature Detection: Comparing Performance of Different Statistical Tests , 2005, EURASIP J. Adv. Signal Process..

[17]  P. Broberg Statistical methods for ranking differentially expressed genes , 2003, Genome Biology.

[18]  Jian Huang,et al.  A Two-Way Semi-Linear Model for Normalization and Significant Analysis of cDNA Microarray Data , 2005 .

[19]  Yu. I. Ingster,et al.  Nonparametric Goodness-of-Fit Testing Under Gaussian Models , 2002 .

[20]  E. L. Lehmann,et al.  On optimality of stepdown and stepup multiple test procedures , 2005 .

[21]  P. Tam,et al.  Normalization and analysis of cDNA microarrays using within-array replications applied to neuroblastoma cell response to a cytokine. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[22]  HIGHER CRITICISM STATISTIC: THEORY AND APPLICATIONS IN NON-GAUSSIAN DETECTION , 2005 .

[23]  H. K. Eriksen,et al.  No Higher Criticism of the Bianchi-corrected Wilkinson Microwave Anisotropy Probe data , 2006 .

[24]  Qiying Wang Limit Theorems for Self-Normalized Large Deviation , 2005 .

[25]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[26]  M. Oh,et al.  Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. , 2001, Nucleic acids research.

[27]  L. Cayón,et al.  The Non-Gaussian Cold Spot in the 3 Year Wilkinson Microwave Anisotropy Probe Data , 2006, astro-ph/0603859.

[28]  J. Neher A problem of multiple comparisons , 2011 .

[29]  Jiashun Jin,et al.  Estimation and Confidence Sets for Sparse Normal Mixtures , 2006, math/0612623.