Stochastic complexity in learning

This is an expository paper on the latest results in the theory of stochastic complexity and the associated MDL principle with special interest in modeling problems arising in machine learning. As an illustration we discuss the problem of designing MDL decision trees, which are meant to improve the earlier designs in two ways: First, by use of the sharper formula for the stochastic complexity at the nodes the earlier found tendency of getting too small trees appears to be overcome. Secondly, a dynamic programming based pruning algorithm is described for finding the optimal trees, which generalizes an algorithm described in Nohre (1994).

[1]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[2]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[3]  Kenji Yamanishi,et al.  A learning criterion for stochastic rules , 1990, COLT '90.

[4]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[5]  Meir Feder,et al.  A universal finite memory source , 1995, IEEE Trans. Inf. Theory.

[6]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[7]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences: statistical considerations , 1969, JACM.

[8]  Jukka Saarinen,et al.  Chaotic time series modeling with optimum neural network architecture , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[9]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[10]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[11]  Lee D. Davisson,et al.  Universal noiseless coding , 1973, IEEE Trans. Inf. Theory.

[12]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[13]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[14]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..