Adapting to unknown sparsity by controlling the false discovery rate

We attempt to recover an n-dimensional vector observed in white noise, where n is large and the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing power-law decay bounds on the ordered entries; and controlling the lp norm for p small. We obtain a procedure which is asymptotically minimax for l r loss, simultaneously throughout a range of such sparsity classes. The optimal procedure is a data-adaptive thresholding scheme, driven by control of the False Discovery Rate (FDR). FDR control is a relatively recent innovation in simultaneous testing, ensuring that at most a certain fraction of the rejected null hypotheses will correspond to false rejections. In our treatment, the FDR control parameter qn also plays a determining role in asymptotic minimaxity. If q = lim qn ∈ [0,1/2] and also qn > γ/log(n) we get sharp asymptotic minimaxity, simultaneously, over a wide range of sparse parameter spaces and loss functions. On the other hand, q = lim qn ∈ (1/2,1], forces the risk to exceed the minimax risk by a factor growing with q. To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new. Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form 2 � log( potential model size / actual model size ). We exhibit a close connection with FDR-controlling procedures under stringent control of the false discovery rate.

[1]  Y. Benjamini,et al.  Adaptive linear step-up procedures that control the false discovery rate , 2006 .

[2]  H. Keselman,et al.  Multiple Comparison Procedures , 2005 .

[3]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[4]  L. Wasserman,et al.  A stochastic process approach to false discovery control , 2004, math/0406519.

[5]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[6]  Christopher R. Genovese,et al.  A Stochastic Process Approach to False Discovery Rates , 2003 .

[7]  John D. Storey A direct approach to false discovery rates , 2002 .

[8]  S. Sarkar Some Results on False Discovery Rate in Stepwise multiple testing procedures , 2002 .

[9]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[10]  P. Massart,et al.  Gaussian model selection , 2001 .

[11]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[12]  Dean Phillips Foster,et al.  Calibration and empirical Bayes variable selection , 2000 .

[13]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[14]  C. Mallows Some Comments on Cp , 2000, Technometrics.

[15]  Y. Benjamini,et al.  A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence , 1999 .

[16]  David Mumford,et al.  Statistics of natural images and models , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[17]  Dean P. Foster,et al.  Local Asymptotic Coding and the Minimum Description Length , 1999, IEEE Trans. Inf. Theory.

[18]  Robert A. Cribbie,et al.  The pairwise multiple comparison multiplicity problem: An alternative approach to familywise and comparison wise Type I error control. , 1999 .

[19]  R. Tibshirani,et al.  The Covariance Inflation Criterion for Adaptive Model Selection , 1999 .

[20]  Eero P. Simoncelli Bayesian Denoising of Visual Images in the Wavelet Domain , 1999 .

[21]  C. H. Oh,et al.  Some comments on , 1998 .

[22]  Dean Phillips Foster,et al.  An Information Theoretic Comparison of Model Selection Criteria , 1997 .

[23]  Y. Benjamini,et al.  Adaptive thresholding of wavelet coefficients , 1996 .

[24]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[25]  C. Mallows More comments on C p , 1995 .

[26]  I. Johnstone,et al.  Wavelet Shrinkage: Asymptopia? , 1995 .

[27]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[28]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[29]  Y. Benjamini,et al.  Thresholding of Wavelet Coefficients as Multiple Hypotheses Testing Procedure , 1995 .

[30]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[31]  I. Johnstone,et al.  Minimax risk overlp-balls forlp-error , 1994 .

[32]  I. Johnstone,et al.  Minimax Risk over l p-Balls for l q-error , 1994 .

[33]  D. Ruderman The statistics of natural images , 1994 .

[34]  I. Johnstone,et al.  Ideal denoising in an orthonormal basis chosen from a library of bases , 1994 .

[35]  I. Johnstone Minimax Bayes, Asymptotic Minimax and Sparse Wavelet Priors , 1994 .

[36]  John J. Benedetto,et al.  A Wavelet Auditory Model and Data Compression , 1993 .

[37]  Gerhard Hommel,et al.  Multiple Hypotheses Testing , 1993 .

[38]  I. Johnstone,et al.  Maximum Entropy and the Nearly Black Object , 1992 .

[39]  Ronald A. DeVore,et al.  Image compression through wavelet transform coding , 1992, IEEE Trans. Inf. Theory.

[40]  S. Geer Estimating a Regression Function , 1990 .

[41]  A. Tamhane,et al.  Multiple Comparison Procedures. , 1989 .

[42]  G. Hommel A stagewise rejective multiple test procedure based on a modified Bonferroni test , 1988 .

[43]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[44]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[45]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[46]  D. Freedman,et al.  How Many Variables Should Be Entered in a Regression Equation , 1983 .

[47]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[48]  R. L. Dekock Some Comments , 2021 .

[49]  C. L. Mallows Some comments on C_p , 1973 .

[50]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[51]  P. Seeger A Note on a Method for the Analysis of Significances en masse , 1968 .

[52]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[53]  W. Feller An Introduction to Probability Theory and Its Applications , 1959 .

[54]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[55]  D. Donoho,et al.  Minimax risk over / p-balls for / q-error , 2022 .