The ABC of Model Selection: AIC, BIC and the New CIC

The geometric theory of ignorance suggests new criteria for model selection. One example is to choose model M minimizing, CIC = −  ∑ i=1N log p(xi) + d2 log N2π + log V + πRN log(d + 1) where (x1,…,xN) is a sample of N iid observations, p ∈ M is the mle, d = dim(M) is the dimension of the model M, V = Vol(M) is its information volume and R = Ricci(M) is the Ricci scalar evaluated at the mle. I study the performance of CIC for the problem of segmentation of bit streams defined as follows: Find n from N iid samples of a complete dag of n bits. The CIC criterion outperforms AIC and BIC by orders of magnitude when n > 3 and it is just better for the cases n = 2, 3.