Bayesian networks for pattern classification, data compression, and channel coding

Pattern classification, data compression, and channel coding are tasks that usually must deal with complex but structured natural or artificial systems. Patterns that we wish to classify are a consequence of a causal physical process. Images that we wish to compress are also a consequence of a causal physical process. Noisy outputs from a telephone line are corrupted versions of a signal produced by a structured man-made telephone modem. Not only are these tasks characterized by complex structure, but they also contain random elements. Graphical models such as Bayesian networks provide a way to describe the relationships between random variables in a stochastic system. In this thesis, I use Bayesian networks as an overarching framework to describe and solve problems in the areas of pattern classification, data compression, and channel coding. Results on the classification of handwritten digits show that Bayesian network pattern classifiers outperform other standard methods, such as the k-nearest neighbor method. When Bayesian networks are used as source models for data compression, an exponentially large number of codewords are associated with each input pattern. It turns out that the code can still be used efficiently, if a new technique called "bits-back coding" is used. Several new error-correcting decoding algorithms are instances of "probability propagation" in various Bayesian networks. These new schemes are rapidly closing the gap between the performances of practical channel coding systems and Shannon's 50-year-old channel coding limit. The Bayesian network framework exposes the similarities between these codes and leads the way to a new class of "trellis-constraint codes" which also operate close to Shannon's limit.

[1]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[2]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[3]  N. N. Vorob’ev Consistent Families of Measures and Their Extensions , 1962 .

[4]  William A. Woods,et al.  What's in a Link: Foundations for Semantic Networks , 1975 .

[5]  Lenhart K. Schubert Extending The Expressive Power Of Semantic Networks , 1976, IJCAI.

[6]  Hideki Imai,et al.  A new multilevel coding method using error-correcting codes , 1977, IEEE Trans. Inf. Theory.

[7]  Lin-Nan Lee Concatenated Coding Systems Employing a Unit-Memory Convolutional Code and a Byte-Oriented Decoding Algorithm , 1977, IEEE Trans. Commun..

[8]  Glen G. Langdon,et al.  Arithmetic Coding , 1979 .

[9]  Andrew J. Viterbi,et al.  Principles of Digital Communication and Coding , 1979 .

[10]  Robert Michael Tanner,et al.  A recursive approach to low complexity codes , 1981, IEEE Trans. Inf. Theory.

[11]  Gottfried Ungerboeck,et al.  Channel coding with multilevel/phase signals , 1982, IEEE Trans. Inf. Theory.

[12]  Shu Lin,et al.  Error control coding : fundamentals and applications , 1983 .

[13]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[14]  David J. Spiegelhalter,et al.  Probabilistic Reasoning in Predictive Expert Systems , 1985, UAI.

[15]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[16]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[17]  Judea Pearl,et al.  Evidential Reasoning Using Stochastic Simulation of Causal Models , 1987, Artif. Intell..

[18]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[19]  D. C. Rapaport,et al.  Book review:Monte Carlo methods. Volume I: Basics , 1987 .

[20]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[21]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[22]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[23]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[24]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[25]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[26]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[27]  Igor I. Sheykhet,et al.  Monte Carlo method in the theory of solutions , 1990 .

[28]  P. Games Correlation and Causation: A Logical Snafu , 1990 .

[29]  C. S. Wallace,et al.  Classification by Minimum-Message-Length Inference , 1991, ICCI.

[30]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[31]  R. Tibshirani Principal curves revisited , 1992 .

[32]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[33]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[34]  Gerasimos Potamianos,et al.  Partition function estimation of Gibbs random field images using Monte Carlo simulations , 1993, IEEE Trans. Inf. Theory.

[35]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[36]  Javier R. Movellan,et al.  Learning Continuous Probability Distributions with Symmetric Diffusion Networks , 1993, Cogn. Sci..

[37]  S. Wicker Error Control Systems for Digital Communication and Storage , 1994 .

[38]  Edward A. Lee,et al.  Digital communication (2. ed.) , 1994 .

[39]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[40]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[41]  Johannes B. Huber,et al.  Power and bandwidth efficient digital communication using turbo codes in multilevel codes , 1995, Eur. Trans. Telecommun..

[42]  Hans-Andrea Loeliger,et al.  Codes and iterative decoding on general graphs , 1995, Eur. Trans. Telecommun..

[43]  Volker Tresp,et al.  Discovering Structure in Continuous Variables Using Bayesian Networks , 1995, NIPS.

[44]  Michael I. Jordan Why the logistic function? A tutorial discussion on probabilities and neural networks , 1995 .

[45]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[46]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[47]  David J. C. MacKay,et al.  Good Codes Based on Very Sparse Matrices , 1995, IMACC.

[48]  Eric Saund,et al.  A Multiple Cause Mixture Model for Unsupervised Learning , 1995, Neural Computation.

[49]  Robert J. McEliece,et al.  On the BCJR trellis for linear block codes , 1996, IEEE Trans. Inf. Theory.

[50]  Radford M. Neal,et al.  Near Shannon limit performance of low density parity check codes , 1996 .

[51]  Geoffrey E. Hinton,et al.  The delve manual , 1996 .

[52]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief NetworksMean Field Theory for Sigmoid Belief , 1996 .

[53]  Michael I. Jordan,et al.  A Variational Approach to Bayesian Logistic Regression Models and their Extensions , 1997, AISTATS.

[54]  Niclas Wiberg,et al.  Codes and Decoding on General Graphs , 1996 .

[55]  Radford M. Neal Markov Chain Monte Carlo Methods Based on `Slicing' the Density Function , 1997 .

[56]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[57]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[58]  Brendan J. Frey,et al.  Iterative Decoding of Compound Codes by Probability Propagation in Graphical Models , 1998, IEEE J. Sel. Areas Commun..