Bayesian networks for lossless dataset compression

The recent explosion in research on probabilistic data mining algorithms such as Bayesian networks has been focussed primarily on their use in diagnostics, prediction and e cient inference. In this paper, we examine the use of Bayesian networks for a di erent purpose: lossless compression of large datasets. We present algorithms for automatically learning Bayesian networks and new structures called \Hu man networks" that model statistical relationships in the datasets, and algorithms for using these models to then compress the datasets. These algorithms often achieve signi cantly better compression ratios than achieved with common dictionary-based algorithms such those used by programs like ZIP.

[1]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[5]  Mary S. Lee Cached Suucient Statistics for Eecient Machine Learning with Large Datasets 1. Caching Suucient Statistics , 1997 .

[6]  Thomas L. Dean,et al.  Probabilistic Temporal Reasoning , 1988, AAAI.

[7]  Richard Clark Pasco,et al.  Source coding algorithms for fast data compression , 1976 .

[8]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[9]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[10]  Andrew W. Moore,et al.  Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets , 1998, J. Artif. Intell. Res..

[11]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[12]  Ian H. Witten,et al.  Arithmetic coding revisited , 1998, TOIS.

[13]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[14]  Jacob Ziv,et al.  Coding theorems for individual sequences , 1978, IEEE Trans. Inf. Theory.

[15]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[16]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[17]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[18]  Ian H. Witten,et al.  Arithmetic coding revisited , 1995, Proceedings DCC '95 Data Compression Conference.

[19]  Jorma Rissanen,et al.  Generalized Kraft Inequality and Arithmetic Coding , 1976, IBM J. Res. Dev..