Small-sample and large-sample statistical model selection criteria

Statistical model selection criteria provide answers to the questions, “How much improvement in fit should be achieved to justify the inclusion of an additional parameter in a model, and on what scale should this improvement in fit be measured?” Mathematically, statistical model selection criteria are defined as estimates of suitable functional of the probability distributions corresponding to alternative models. This paper discusses different approaches to model-selection criteria, with a view toward illuminating their similarities and differences. The approaches discussed range from explicit, small-sample criteria for highly specific problems to general, large-sample criteria such as Akaike’s information criterion and variants thereof. Special emphasis is given to criteria derived from a Bayesian approach, as this presents a unified way of viewing a variety of criteria. In particular, the approach to model-selection criteria by asymptotic expansion of the log posterior probabilities of alternative models is reviewed. An information-theoretic approach to model selection, through minimum-bit data representation, is explored. Similarity of the asymptotic form of Rissanen’s criterion, obtained from a minimum-bit data representation approach, to criteria derived from a Bayesian approach, is discussed.

[1]  L. J. Savage,et al.  Probability and the weighing of evidence , 1951 .

[2]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[3]  Aleksandr Yakovlevich Khinchin,et al.  Mathematical foundations of information theory , 1959 .

[4]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[5]  W. Hoeffding,et al.  Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling , 1961 .

[6]  F. J. Anscombe,et al.  Topics in the Investigation of Linear Relations Fitted by the Method of Least Squares , 1967 .

[7]  H. Akaike Fitting autoregressive models for prediction , 1969 .

[8]  M. Degroot Optimal Statistical Decisions , 1970 .

[9]  Murray Aitkin,et al.  Simultaneous Inference and the Choice of Variable Subsets in Multiple Regression , 1974 .

[10]  A. Afifi,et al.  Comparison of Stopping Rules in Forward “Stepwise” Regression , 1977 .

[11]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[12]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[13]  E. Hannan The Estimation of the Order of an ARMA Process , 1980 .

[14]  Rangasami L. Kashyap,et al.  Optimal Choice of AR and MA Parts in Autoregressive Moving Average Models , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  E. Parzen Maximum entropy interpretation of autoregressive spectral densities , 1982 .

[16]  Michel Lejeune,et al.  A Simple Predictive Density Function , 1982 .

[17]  Hirotugu Akaike,et al.  Statistical Inference and Measurement of Entropy , 1983 .

[18]  J. Hartigan A failure of likelihood asymptotics for normal mixtures , 1985 .

[19]  H. Akaike Prediction and Entropy , 1985 .

[20]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[21]  H. Akaike Factor analysis and AIC , 1987 .

[22]  L. A. Baxter A Celebration of statistics : the ISI centenary volume , 1991 .

[23]  Stanley L. Sclove,et al.  Some Aspects of Model-Selection Criteria , 1994 .