Gaussian model selection with an unknown variance

Let Y be a Gaussian vector whose components are independent with a common unknown variance. We consider the problem of estimating the mean μ of Y by model selection. More precisely, we start with a collection $\mathcal{S}=\{S_{m},m\in\mathcal{M}\}$ of linear subspaces of ℝn and associate to each of these the least-squares estimator of μ on Sm. Then, we use a data driven penalized criterion in order to select one estimator among these. Our first objective is to analyze the performance of estimators associated to classical criteria such as FPE, AIC, BIC and AMDL. Our second objective is to propose better penalties that are versatile enough to take into account both the complexity of the collection $\mathcal{S}$ and the sample size. Then we apply those to solve various statistical problems such as variable selection, change point detections and signal estimation among others. Our results are based on a nonasymptotic risk bound with respect to the Euclidean loss for the selected estimator. Some analogous results are also established for the Kullback loss.

[1]  Emilie Lebarbier,et al.  Detecting multiple change-points in the mean of Gaussian process by model selection , 2005, Signal Process..

[2]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[3]  C. Mallows More comments on C p , 1995 .

[4]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[5]  B. Laurent,et al.  ADAPTIVE TESTS OF LINEAR HYPOTHESES BY MODEL SELECTION , 2003 .

[6]  P. Massart,et al.  Minimal Penalties for Gaussian Model Selection , 2007 .

[7]  Naoki Saito,et al.  Simultaneous noise suppression and signal compression using a library of orthonormal bases and the minimum-description-length criterion , 1994, Defense, Security, and Sensing.

[8]  H. Akaike Statistical predictor identification , 1970 .

[9]  I. Johnstone,et al.  Adapting to unknown sparsity by controlling the false discovery rate , 2005, math/0505374.

[10]  Sylvie Huet,et al.  Model selection for estimating the non zero components of a Gaussian vector , 2006 .

[11]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[12]  A. McQuarrie,et al.  Regression and Time Series Model Selection , 1998 .

[13]  H. Akaike A Bayesian analysis of the minimum AIC procedure , 1978 .

[14]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[15]  P. Massart,et al.  Gaussian model selection , 2001 .

[16]  P. Massart,et al.  From Model Selection to Adaptive Estimation , 1997 .

[17]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[20]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[21]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[22]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[23]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[24]  G. Schwarz Estimating the Dimension of a Model , 1978 .