MDL/Bayesian Criteria Based on Universal Coding/Measure

In the minimum description length (MDL) and Bayesian criteria, we construct description length of data z n = z 1 ⋯ z n of length n such that the length divided by n almost converges to its entropy rate as n → ∞, assuming z i is in a finite set A. In model selection, if we knew the true probability P of z n ∈ A n , we would choose a model F such that the posterior probability of F given z n is maximized. But, in many situations, we use Q:A n → [0,1] such that \(\sum_{z^n\in A^n}Q(z^n)\leq 1\) rather than P because only data z n are available. In this paper, we consider an extension such that each of the attributes in data can be either discrete or continuous. The main issue is what Q is qualified to be an alternative to P in the generalized situations. We propose the condition in terms of the Radon-Nikodym derivative of P with respect to Q, and give the procedure of constructing Q in the general setting. As a result, we obtain the MDL/Bayesian criteria in a general sense.

[1]  C. S. Wallace,et al.  Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[2]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[3]  Joe Suzuki,et al.  On Strong Consistency of Model Selection in Classification , 2006, IEEE Transactions on Information Theory.

[4]  David L. Dowe,et al.  Foreword re C. S. Wallace , 2008, Comput. J..

[5]  Joe Suzuki,et al.  A Construction of Bayesian Networks from Databases Based on an MDL Principle , 1993, UAI.

[6]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[7]  A. Barron THE STRONG ERGODIC THEOREM FOR DENSITIES: GENERALIZED SHANNON-MCMILLAN-BREIMAN THEOREM' , 1985 .

[8]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[9]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[10]  Joe Suzuki The Universal Measure for General Sources and Its Application to MDL/Bayesian Criteria , 2011, 2011 Data Compression Conference.

[11]  David L. Dowe,et al.  MML, hybrid Bayesian network graphical models, statistical consistency, invarianc , 2010 .

[12]  Boris Ryabko,et al.  Compression-Based Methods for Nonparametric Prediction and Estimation of Some Characteristics of Time Series , 2009, IEEE Transactions on Information Theory.

[13]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[14]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[15]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .