Mean Field Theory for Sigmoid Belief NetworksMean Field Theory for Sigmoid Belief

We develop a mean eld theory for sigmoid belief networks based on ideas from statistical mechanics. Our mean eld theory provides a tractable approximation to the true probability distribution in these networks; it also yields a lower bound on the likelihood of evidence. We demonstrate the utility of this framework on a benchmark problem in statistical pattern recognition|the classiication of handwritten digits.

[1]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[4]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[6]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[7]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[8]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[9]  G. Parisi,et al.  Statistical Field Theory , 1988 .

[10]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[11]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[12]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[13]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[14]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[15]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[16]  Stuart J. Russell,et al.  Local Learning in Probabilistic Networks with Hidden Variables , 1995, IJCAI.

[17]  Uffe Kjærulff,et al.  Blocking Gibbs sampling in very large probabilistic expert systems , 1995, Int. J. Hum. Comput. Stud..

[18]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[19]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[20]  H. Seung ANNEALED THEORIES OF LEARNING , 1995 .

[21]  Brendan J. Frey,et al.  Does the Wake-sleep Algorithm Produce Good Density Estimators? , 1995, NIPS.

[22]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[23]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.