Can the Strengths of AIC and BIC Be Shared

It is well known that AIC and BIC have dierent properties in model selection. BIC is consistent in the sense that if the true model is among the candidates, the probability of selecting the true model approaches 1. On the other hand, AIC is minimax-rate optimal for both parametric and nonparametric cases for estimating the regression function. There are several successful results on constructing new model selection criteria to share some strengths of AIC and BIC. However, we show that in a rigorous sense, even in the setting that the true model is included in the candidates, the above mentioned main strengths of AIC and BIC cannot be shared. That is, for any model selection criterion to be consistent, it must behave sup-optimally compared to AIC in terms of mean average squared error.

[1]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[2]  C. L. Mallows Some comments on C_p , 1973 .

[3]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[4]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[7]  R. Shibata Asymptotic mean efficiency of a selection of regression variables , 1983 .

[8]  R. Nishii Asymptotic Properties of Criteria for Selection of Variables in Multiple Regression , 1984 .

[9]  Ker-Chau Li,et al.  Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set , 1987 .

[10]  Boris Polyak,et al.  Asymptotic Optimality of the $C_p$-Test for the Orthogonal Series Estimation of Regression , 1991 .

[11]  Bin Yu,et al.  Asymptotically optimal function estimation by minimum complexity criteria , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[12]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[13]  C. Mallows More comments on C p , 1995 .

[14]  M. Clyde,et al.  Prediction via Orthogonalized Model Mixing , 1996 .

[15]  L. Brown,et al.  A constrained risk inequality with applications to nonparametric functional estimation , 1996 .

[16]  Dean Phillips Foster,et al.  Calibration and Empirical Bayes Variable Selection , 1997 .

[17]  J. Shao AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[18]  Yuhong Yang MODEL SELECTION FOR NONPARAMETRIC REGRESSION , 1997 .

[19]  Lawrence D. Brown,et al.  Superefficiency in Nonparametric Function Estimation , 1997 .

[20]  I. Johnstone,et al.  Minimax estimation via wavelet shrinkage , 1998 .

[21]  C. H. Oh,et al.  Some comments on , 1998 .

[22]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[23]  Yuhong Yang REGRESSION WITH MULTIPLE CANDIDATE MODELS: SELECTING OR MIXING? , 1999 .

[24]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[25]  P. Massart,et al.  Gaussian model selection , 2001 .

[26]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[27]  Xiaotong Shen,et al.  Adaptive Model Selection , 2002 .

[28]  D. Hand,et al.  Local Versus Global Models for Classification Problems , 2003 .

[29]  B. M. Pötscher,et al.  MODEL SELECTION AND INFERENCE: FACTS AND FICTION , 2005, Econometric Theory.