Keep it simple stupid — On the effect of lower-order terms in BIC-like criteria

We study BIC-like model selection criteria. In particular, we approximate the lower-order terms, which typically include the constant log ∫√det I(θ) dθ, where I(θ) is the Fisher information at parameter value θ. We observe that the constant can sometimes be a huge negative number that dominates the other terms in the criterion for moderate sample sizes. At least in the case of Markov sources, including the lower-order terms in the criteria dramatically degrades model selection accuracy. A take-home lesson is to keep it simple.

[1]  Teemu Roos Monte Carlo estimation of minimax regret with an application to MDL model selection , 2008, 2008 IEEE Information Theory Workshop.

[2]  Petri Myllymäki,et al.  A linear-time algorithm for computing the multinomial stochastic complexity , 2007, Inf. Process. Lett..

[3]  A. Barron,et al.  Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[4]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[5]  Tsutomu Kawabata,et al.  Properties of Jeffreys Mixture for Markov Sources , 2013, IEEE Transactions on Information Theory.

[6]  Daniel J. Navarro,et al.  A Note on the Applied Use of MDL Approximations , 2004, Neural Computation.

[7]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[8]  Jorma Rissanen,et al.  Information and Complexity in Statistical Modeling , 2006, ITW.

[9]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[10]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[11]  H. Akaike A new look at the statistical model identification , 1974 .

[12]  J. Rissanen,et al.  Conditional NML Universal Models , 2007, 2007 Information Theory and Applications Workshop.

[13]  A. Barron,et al.  Asymptotic minimax regret for data compression, gambling and prediction , 1997, Proceedings of IEEE International Symposium on Information Theory.

[14]  Marian Tetiva,et al.  ON THE STOCHASTIC COMPLEXITY FOR ORDER-1 MARKOV CHAINS AND THE CATALAN CONSTANT , 2007 .

[15]  David Draper,et al.  Assessment and Propagation of Model Uncertainty , 2011 .

[16]  Philippe Jacquet,et al.  Markov types and minimax redundancy for Markov sources , 2004, IEEE Transactions on Information Theory.