Robust test for detecting a signal in a high dimensional sparse normal vector

Abstract Let Zi, i = 1 , … , n , be independent random variables, EZ i = μ i , and Var ( Z i ) = 1 . We consider the problem of testing H 0 : μ i = 0 , i = 1 , … , n , when n is large, and the vector ( μ 1 , … , μ n ) is ‘sparse’, e.g., ∑ i = 1 n μ i 2 = o ( n ) . We suggest a robust test which is not sensitive to the exact tail behavior implied under normality assumptions. In particular, our test is ‘robust’ if the ‘moderate deviation’ tail of the distribution of Zi may be represented as the product of a tail of a standard normal and a ‘slowly changing’ function. This implies that whenever an Anderson–Darling type of test is robust our proposed test is also ‘robust’. A situation where the above mentioned tail behavior is expected, is when the Zi are of the form Z i = ∑ j = 1 m Y ij / m , for large m, m ⪡ n , and Yij are independent and identically distributed. For Zi of this form we show that our test is ‘robust’ when log n = o ( m ) , while Anderson–Darling type tests are robust only when ( log n ) 3 = o ( m ) . We provide examples and simulation evidence to demonstrate the robustness of our proposed test and the need for such robust tests. We also present a real data example highlighting the importance of robustness.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  H. Riedwyl Goodness of Fit , 1967 .

[3]  I. Weissman Estimation of Parameters and Large Quantiles Based on the k Largest Observations , 1978 .

[4]  Peter Hall,et al.  On the estimation of extreme tail probabilities , 1997 .

[5]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[6]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[7]  P. Billingsley,et al.  Probability and Measure , 1980 .

[8]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[9]  Jianqing Fan Test of Significance Based on Wavelet Thresholding and Neyman's Truncation , 1996 .

[10]  T. W. Anderson,et al.  Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes , 1952 .

[11]  B. M. Hill,et al.  A Simple General Approach to Inference About the Tail of a Distribution , 1975 .

[12]  Jianqing Fan,et al.  To How Many Simultaneous Hypothesis Tests Can Normal, Student's t or Bootstrap Calibration Be Applied? , 2006, math/0701003.

[13]  N. Meinshausen,et al.  Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses , 2005, math/0501289.

[14]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.