The Minimum Description Length Principle in Coding and Modeling

We review the principles of minimum description length and stochastic complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon's basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion both from the vantage point of quality of data compression and accuracy of statistical inference. Context tree modeling, density estimation, and model selection in Gaussian linear regression serve as examples.

[1]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[2]  L. M. M.-T. Theory of Probability , 1929, Nature.

[3]  Le Cam,et al.  On some asymptotic properties of maximum likelihood estimates and related Bayes' estimates , 1953 .

[4]  J. Doob Stochastic processes , 1953 .

[5]  A. Tulcea Contributions to information theory for abstract alphabets , 1961 .

[6]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[7]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[8]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[9]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[10]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[11]  Lee D. Davisson,et al.  Universal noiseless coding , 1973, IEEE Trans. Inf. Theory.

[12]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[13]  R. Khasminskii A Lower Bound on the Risks of Non-Parametric Estimates of Densities in the Uniform Metric , 1979 .

[14]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[15]  B. G. Quinn,et al.  The determination of the order of an autoregression , 1979 .

[16]  Alberto Leon-Garcia,et al.  A source matching approach to finding minimax codes , 1980, IEEE Trans. Inf. Theory.

[17]  E. Hannan The Estimation of the Order of an ARMA Process , 1980 .

[18]  Glen G. Langdon,et al.  Universal modeling and coding , 1981, IEEE Trans. Inf. Theory.

[19]  E. Hannan,et al.  The determination of optimum structures for the state space representation of multivariate stochastic processes , 1982 .

[20]  U. Hjorth Model Selection and Forward Validation , 1982 .

[21]  JORMA RISSANEN,et al.  A universal data compression system , 1983, IEEE Trans. Inf. Theory.

[22]  Lucien Birgé Approximation dans les espaces métriques et théorie de l'estimation , 1983 .

[23]  Lee D. Davisson,et al.  Minimax noiseless universal coding for Markov sources , 1983, IEEE Trans. Inf. Theory.

[24]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[25]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[26]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[27]  Jorma Rissanen,et al.  A Predictive Least-Squares Principle , 1986 .

[28]  A. Barron Are Bayes Rules Consistent in Information , 1987 .

[29]  L. Devroye A Course in Density Estimation , 1987 .

[30]  I. Ibragimov,et al.  On density estimation in the view of Kolmogorov's ideas in approximation theory , 1990 .

[31]  Mark H. A. Davis,et al.  Strong Consistency of the PLS Criterion for Order Determination of Autoregressive Processes , 1989 .

[32]  Andrew R. Barron,et al.  Information-theoretic asymptotics of Bayes methods , 1990, IEEE Trans. Inf. Theory.

[33]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[34]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[35]  David Haussler,et al.  HOW WELL DO BAYES METHODS WORK FOR ON-LINE PREDICTION OF {+- 1} VALUES? , 1992 .

[36]  Jorma Rissanen,et al.  Density estimation by stochastic complexity , 1992, IEEE Trans. Inf. Theory.

[37]  C. Z. Wei On Predictive Least Squares Principles , 1992 .

[38]  A. Barron,et al.  Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[39]  A. P. Dawid,et al.  Prequential data analysis , 1992 .

[40]  Neri Merhav,et al.  Estimating the number of states of a finite-state source , 1992, IEEE Trans. Inf. Theory.

[41]  T. Speed,et al.  Data compression and histograms , 1992 .

[42]  T. Speed,et al.  Model selection and prediction: Normal regression , 1993 .

[43]  Bin Yu,et al.  Asymptotically optimal function estimation by minimum complexity criteria , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[44]  Frans M. J. Willems,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[45]  Neri Merhav,et al.  A strong version of the redundancy-capacity theorem of universal coding , 1995, IEEE Trans. Inf. Theory.

[46]  Meir Feder,et al.  A universal finite memory source , 1995, IEEE Trans. Inf. Theory.

[47]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[48]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[49]  David Haussler,et al.  A general minimax result for relative entropy , 1997, IEEE Trans. Inf. Theory.

[50]  D. Haussler,et al.  MUTUAL INFORMATION, METRIC ENTROPY AND CUMULATIVE RELATIVE ENTROPY RISK , 1997 .

[51]  Song Yang A generalization of the product-limit estimator with an application to censored regression , 1997 .

[52]  A. Barron,et al.  Asymptotic minimax regret for data compression, gambling and prediction , 1997, Proceedings of IEEE International Symposium on Information Theory.

[53]  A. Barron,et al.  Asymptotically minimax regret by Bayes mixtures , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[54]  A. Barron,et al.  Information theory and superefficiency , 1998 .

[55]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[56]  Jorma Rissanen,et al.  Hypothesis Selection and Testing by the MDL Principle , 1999, Comput. J..

[57]  Eduardo Sontag Festschrift in Honor of , 2022 .