A criterion for model selection using minimum description length

Rissanen (1978) proposed the idea that the goodness of fit of a parametric model of the probability density of a random variable could be thought of as an information coding problem. He argued that the best model was that which was able to describe the training data together with the model parameters using the fewest number of bits of information (Occam's razor). This paper builds upon that basic insight and derives a more general result than did Rissanen, dealing as he was, with time series analysis. To arrive at a model selection criterion with wider applicability, the present derivation relies upon results from information theory and the theory of rate-distortion.