Mean Field Theory for Sigmoid Belief Networks

We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. Our mean field theory provides a tractable approximation to the true probability distribution in these networks; it also yields a lower bound on the likelihood of evidence. We demonstrate the utility of this framework on a benchmark problem in statistical pattern recognition-the classification of handwritten digits.

[1]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[5]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[6]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[7]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[8]  G. Parisi,et al.  Statistical Field Theory , 1988 .

[9]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[10]  William H. Press,et al.  Numerical recipes , 1990 .

[11]  Eric Mjolsness,et al.  Algebraic transformations of objective functions , 1990, Neural Networks.

[12]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[13]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[16]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[17]  M. Neal,et al.  A New View of the EM Algorithm thatJusti es Incremental and Other VariantsRadford , 1993 .

[18]  Michael Luby,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..

[19]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[20]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[21]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[22]  Stuart J. Russell,et al.  Local Learning in Probabilistic Networks with Hidden Variables , 1995, IJCAI.

[23]  Uffe Kjærulff,et al.  Blocking Gibbs sampling in very large probabilistic expert systems , 1995, Int. J. Hum. Comput. Stud..

[24]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[25]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[26]  Hill,et al.  Annealed Theories of Learning , 1995 .

[27]  Jong-Hoon Oh,et al.  Neural networks : the statistical mechanics perspective : proceedings of the CTP-PBSRI Joint Workshop on Theoretical Physics, POSTECH, Pohang, Korea, 2-4 February 95 , 1995 .

[28]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[29]  Michael I. Jordan,et al.  Computing upper and lower bounds on likelihoods in intractable networks , 1996, UAI.