论文信息 - MDL/Bayesian Criteria Based on Universal Coding/Measure

MDL/Bayesian Criteria Based on Universal Coding/Measure

In the minimum description length (MDL) and Bayesian criteria, we construct description length of data z n = z 1 ⋯ z n of length n such that the length divided by n almost converges to its entropy rate as n → ∞, assuming z i is in a finite set A. In model selection, if we knew the true probability P of z n ∈ A n , we would choose a model F such that the posterior probability of F given z n is maximized. But, in many situations, we use Q:A n → [0,1] such that \(\sum_{z^n\in A^n}Q(z^n)\leq 1\) rather than P because only data z n are available. In this paper, we consider an extension such that each of the attributes in data can be either discrete or continuous. The main issue is what Q is qualified to be an alternative to P in the generalized situations. We propose the condition in terms of the Radon-Nikodym derivative of P with respect to Q, and give the procedure of constructing Q in the general setting. As a result, we obtain the MDL/Bayesian criteria in a general sense.

Joe Suzuki

[1] C. S. Wallace,et al. Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[2] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[3] Joe Suzuki,et al. On Strong Consistency of Model Selection in Classification , 2006, IEEE Transactions on Information Theory.

[4] David L. Dowe,et al. Foreword re C. S. Wallace , 2008, Comput. J..

[5] Joe Suzuki,et al. A Construction of Bayesian Networks from Databases Based on an MDL Principle , 1993, UAI.

[6] Raphail E. Krichevsky,et al. The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[7] A. Barron. THE STRONG ERGODIC THEOREM FOR DENSITIES: GENERALIZED SHANNON-MCMILLAN-BREIMAN THEOREM' , 1985 .

[8] R. A. Leibler,et al. On Information and Sufficiency , 1951 .

[9] Wray L. Buntine,et al. Learning classification trees , 1992 .

[10] Joe Suzuki. The Universal Measure for General Sources and Its Application to MDL/Bayesian Criteria , 2011, 2011 Data Compression Conference.

[11] David L. Dowe,et al. MML, hybrid Bayesian network graphical models, statistical consistency, invarianc , 2010 .

[12] Boris Ryabko,et al. Compression-Based Methods for Nonparametric Prediction and Estimation of Some Characteristics of Time Series , 2009, IEEE Transactions on Information Theory.

[13] Gregory F. Cooper,et al. A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[14] C. S. Wallace,et al. An Information Measure for Classification , 1968, Comput. J..

[15] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[16] Thomas M. Cover,et al. Elements of Information Theory , 2005 .