Convergence of Discrete MDL for Sequential Prediction

We study the properties of the Minimum Description Length principle for sequence prediction, considering a two-part MDL estimator which is chosen from a countable class of models. This applies in particular to the important case of universal sequence prediction, where the model class corresponds to all algorithms for some fixed universal Turing machine (this correspondence is by enumerable semimeasures, hence the resulting models are stochastic). We prove convergence theorems similar to Solomonoff’s theorem of universal induction, which also holds for general Bayes mixtures. The bound characterizing the convergence speed for MDL predictions is exponentially larger as compared to Bayes mixtures. We observe that there are at least three different ways of using MDL for prediction. One of these has worse prediction properties, for which predictions only converge if the MDL estimator stabilizes. We establish sufficient conditions for this to occur. Finally, some immediate consequences for complexity relations and randomness criteria are proven.

[1]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[2]  C. Schnorr Zufälligkeit und Wahrscheinlichkeit , 1971 .

[3]  Marcus Hutter,et al.  On the Convergence Speed of MDL Predictions for Bernoulli Sequences , 2004, ALT.

[4]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[5]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[6]  Marcus Hutter Optimality of universal Bayesian prediction for general loss and alphabet , 2003 .

[7]  Marcus Hutter Convergence and Error Bounds for Universal Prediction of Nonbinary Sequences , 2001, ECML.

[8]  Jorma Rissanen,et al.  Hypothesis Selection and Testing by the MDL Principle , 1999, Comput. J..

[9]  Cristian S. Calude Information and Randomness , 1994, Monographs in Theoretical Computer Science An EATCS Series.

[10]  Péter Gács,et al.  On the relation between descriptional complexity and algorithmic probability , 1981, 22nd Annual Symposium on Foundations of Computer Science (sfcs 1981).

[11]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[12]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[13]  William I. Gasarch,et al.  Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.

[14]  Peter Gr Unwald The minimum description length principle and reasoning under uncertainty , 1998 .

[15]  Marcus Hutter,et al.  Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet , 2003, J. Mach. Learn. Res..

[16]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[17]  L. Levin,et al.  THE COMPLEXITY OF FINITE OBJECTS AND THE DEVELOPMENT OF THE CONCEPTS OF INFORMATION AND RANDOMNESS BY MEANS OF THE THEORY OF ALGORITHMS , 1970 .

[18]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[19]  Ray J. Solomonoff,et al.  Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.

[20]  ComplexityValentine KabanetsDecember Randomness and Complexity , 1997 .

[21]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[22]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[23]  Marcus Hutter Sequence Prediction Based on Monotone Complexity , 2003, COLT.

[24]  Marcus Hutter New Error Bounds for Solomonoff Prediction , 2001, J. Comput. Syst. Sci..