Information criteria: How do they behave in different models?

The choice of the best model is crucial in modeling data, and parsimony is one of the principles that must guide this choice. Despite their broad use in model selection, the foundations of the Akaike information criterion (AIC), the corrected Akaike criterion (AICc) and the Bayesian information criterion (BIC) are, in general, poorly understood. The AIC, AICc and BIC penalize the likelihoods in order to select the simplest model. These criteria are based upon concepts of information and entropy, which are explained in this work, by focusing on a statistical approach. The three criteria are compared through Monte Carlo simulations, and the applications of these criteria are investigated in the selection of normal models, the selection of biological growth models and selection of time series models. For the simulation with normal models, all three criteria exhibited poor performance for a small sample size N=100 (particularly, when the variances are slightly different). For biological growth model simulations with a very small sample size N=13 the AIC and AICc showed better performance in comparison to the BIC. The simulation based on time series models produced results similar to the normal model simulations. For these simulations, the BIC exhibited superior performance, in some cases, in comparison to the other two information criteria (AIC and AICc) for a small sample size N=100, but in other cases, the BIC performed poorly, as did the AIC and AICc.

[1]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[2]  G. Kitagawa,et al.  Akaike Information Criterion Statistics , 1988 .

[3]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[4]  Fabyano Fonseca e Silva,et al.  Curva de crescimento de novilhos Hereford: heterocedasticidade e resíduos autorregressivos , 2005 .

[5]  Nils Lid Hjort,et al.  Goodness of Fit via Non‐parametric Likelihood Ratios , 2004 .

[6]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[7]  Clifford M. Hurvich,et al.  Model selection for extended quasi-likelihood models in small samples. , 1995, Biometrics.

[8]  Hiroshi Shono Efficiency of the finite correction of Akaike's Information Criteria. , 2000 .

[9]  J. Nelder The Fitting of a Generalization of the Logistic Curve , 1961 .

[10]  G. Kitagawa,et al.  Information Criteria and Statistical Modeling , 2007 .

[11]  C. Tsallis Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World , 2009 .

[12]  C. Raghavendra Rao,et al.  On model selection , 2001 .

[13]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[14]  L. Bertalanffy Quantitative Laws in Metabolism and Growth , 1957 .

[15]  Marc J. Mazerolle Mouvements et reproduction des amphibiens en tourbières perturbées , 2004 .

[16]  Nils Lid Hjort,et al.  Model Selection and Model Averaging , 2001 .

[17]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[18]  H. Pasternak,et al.  The effect of a feature of regression disturbance on the efficiency of fitting growth curves. , 1994, Growth, development, and aging : GDA.

[19]  A. McQuarrie,et al.  Regression and Time Series Model Selection , 1998 .

[20]  R S DeNise,et al.  Genetic and environmental aspects of the growth curve parameters in beef cows. , 1985, Journal of animal science.

[21]  Abdallah Mkhadri,et al.  A corrected Akaike criterion based on Kullback's symmetric divergence: applications in time series, multiple and multivariate regression , 2006, Comput. Stat. Data Anal..

[22]  L. Walford,et al.  Bioenergetics and Growth , 1947 .

[23]  N. Sugiura Further analysts of the data by akaike' s information criterion and the finite corrections , 1978 .

[24]  ALMA HOWARD,et al.  Growth Curves in Inbred Mice , 1967, Nature.

[25]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[26]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[27]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[28]  H. Fitzhugh,et al.  Analysis of growth curves and strategies for altering their shape. , 1976, Journal of animal science.

[29]  N. Draper,et al.  Applied Regression Analysis , 1967 .

[30]  Fabyano Fonseca e Silva,et al.  Modelo logístico difásico no estudo do crescimento de fêmeas da raça Hereford , 2008 .

[31]  Walter Zucchini,et al.  Model Selection , 2011, International Encyclopedia of Statistical Science.