AIC and Cp as estimators of loss for spherically symmetric distributions

In this article, we develop a modern perspective on Akaike’s Information Criterion and Mallows’ Cp for model selection. Despite the differences in their respective motivation, they are equivalent in the special case of Gaussian linear regression. In this case they are also equivalent to a third criterion, an unbiased estimator of the quadratic prediction loss, derived from loss estimation theory. Our first contribution is to provide an explicit link between loss estimation and model selection through a new oracle inequality. We then show that the form of the unbiased estimator of the quadratic prediction loss under a Gaussian assumption still holds under a more general distributional assumption, the family of spherically symmetric distributions. One of the features of our results is that our criterion does not rely on the specificity of the distribution, but only on its spherical symmetry. Also this family of laws offers some dependence property between the observations, a case not often studied.

[1]  Martin T. Wells,et al.  On Improved Loss Estimation for Shrinkage Estimators , 2012, 1203.4989.

[2]  Tatsuya Kubokawa,et al.  Robust Improvement in Estimation of a Mean Matrix in an Elliptically Contoured Distribution , 2001 .

[3]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[4]  Alexey Ya. Chervonenkis,et al.  On the Uniform Convergence of the Frequencies of Occurrence of Events to Their Probabilities , 2013, Empirical Inference.

[5]  On Inadmissibility of Some Unbiased Estimates of Loss , 1988 .

[6]  A robust generalized Bayes estimator improving on the James-Stein estimator for spherically symmetric distributions , 2003 .

[7]  R. Shibata Asymptotically Efficient Selection of the Order of the Model for Estimating Parameters of a Linear Process , 1980 .

[8]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[9]  Donald B. Percival,et al.  Spectrum estimation by wavelet thresholding of multitaper estimators , 1998, IEEE Trans. Signal Process..

[10]  Tatsuya Kubokawa,et al.  Robust improvement in estimation of a covariance matrix in an elliptically contoured distribution , 1999 .

[11]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[12]  Mary C. Meyer,et al.  ON THE DEGREES OF FREEDOM IN SHAPE-RESTRICTED REGRESSION , 2000 .

[13]  Dominique Fourdrinier,et al.  On Bayes and unbiased estimators of loss , 2003 .

[14]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[15]  T. W. Anderson,et al.  Statistical Inference in Elliptically Contoured and Related Distributions , 1990 .

[16]  William E. Strawderman,et al.  A new class of generalized Bayes minimax ridge regression estimators , 2004 .

[17]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[18]  A. Bruce,et al.  WAVESHRINK WITH FIRM SHRINKAGE , 1997 .

[19]  M. Wells,et al.  Estimation of a Loss Function for Spherically Symmetric Distributions in the General Linear Model , 1995 .

[20]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[21]  Pascal Massart,et al.  Data-driven Calibration of Penalties for Least-Squares Regression , 2008, J. Mach. Learn. Res..

[22]  Else Sandved Ancillary Statistics and Estimation of the Loss in Estimation Problems , 1968 .

[23]  D. Fourdrinier,et al.  Improved Model Selection Method for a Regression Function with Dependent Noise , 2007 .

[24]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[25]  D. Cellier,et al.  Shrinkage Estimators under Spherical Symmetry for the General Linear Model , 1995 .

[26]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[27]  J. Shao AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[28]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[29]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[30]  Ker-Chau Li,et al.  From Stein's Unbiased Risk Estimates to the Method of Generalized Cross Validation , 1985 .

[31]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[32]  C. Robert,et al.  Robust shrinkage estimators of the location parameter for elliptically symmetric distributions , 1989 .

[33]  Martin T. Wells,et al.  Robust shrinkage estimation for elliptically symmetric distributions with unknown covariance matrix , 2003 .

[34]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[35]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[36]  H. Akaike Factor analysis and AIC , 1987 .

[37]  Y. Baraud Model selection for regression on a fixed design , 2000 .

[38]  K. Fang,et al.  Generalized Multivariate Analysis , 1990 .

[39]  Robust generalized Bayes minimax estimators of location vectors for spherically symmetric distributions with unknown scale , 2010 .

[40]  C. Lele Admissibility Results in Loss Estimation , 1993 .

[41]  M. A. Chmielewski,et al.  Elliptically Symmetric Distributions: A Review and Bibliography , 1981 .

[42]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[43]  William E. Strawderman,et al.  An extended class of minimax generalized Bayes estimators of regression coefficients , 2009, J. Multivar. Anal..

[44]  B. Efron The Estimation of Prediction Error , 2004 .

[45]  P. Massart,et al.  Minimal Penalties for Gaussian Model Selection , 2007 .

[46]  Francis R. Bach,et al.  Data-driven calibration of linear estimators with minimal penalties , 2009, NIPS.

[47]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[48]  Power analysis for linear models with spherical errors , 2002 .

[49]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[50]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[51]  Yunqian Ma,et al.  Comparison of Model Selection for Regression , 2003, Neural Computation.