Belief Propagation , Mean-field , and Bethe approximations

This chapter describes methods for estimating the marginals and maximum a posteriori (MAP) estimates of probability distributions defined over graphs by approximate methods including Mean Field Theory (MFT), variational methods, and belief propagation. These methods typically formulate this problem in terms of minimizing a free energy function of pseudomarginals. They differ by the design of the free energy and the choice of algorithm to minimize it. These algorithms can often be interpreted in terms of message passing. In many cases, the free energy has a dual formulation and the algorithms are defined over the dual variables (e.g., the messages in belief propagation). The quality of performance depends on the types of free energies used – specifically how well they approximate the log partition function of the probability distribution – and whether there are suitable algorithms for finding their minima. We start in section (II) by introducing two types of Markov Field models that are often used in computer vision. We proceed to define MFT/variational methods in section (III), whose free energies are lower bounds of the log partition function, and describe how inference can be done by expectation-maximization, steepest descent, or discrete iterative algorithms. The following section (IV) describes message passing algorithms, such as belief propagation and its generalizations, which can be related to free energy functions (and dual variables). Finally in section (V) we describe how these methods relate to Markov Chain Monte Carlo (MCMC) approaches, which gives a different way to think of these methods and which can lead to novel algorithms.

[1]  J. Rustagi Variational Methods in Statistics , 2012 .

[2]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[3]  J. Leeuw Applications of Convex Analysis to Multidimensional Scaling , 2000 .

[4]  Song-Chun Zhu,et al.  Prior Learning and Gibbs Reaction-Diffusion , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[6]  Alan L. Yuille,et al.  A mathematical analysis of the motion coherence theory , 1989, International Journal of Computer Vision.

[7]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[8]  Zhuowen Tu,et al.  Image Parsing: Segmentation, Detection, and Recognition , 2003 .

[9]  Michael Isard,et al.  The CONDENSATION Algorithm - Conditional Density Propagation and Applications to Visual Tracking , 1996, NIPS.

[10]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[11]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[12]  J. J. Hopfield,et al.  “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.

[13]  J. J. Kosowsky,et al.  Statistical Physics Algorithms That Converge , 1994, Neural Computation.

[14]  A. Yuille,et al.  Track finding with deformable templates — the elastic arms approach , 1992 .

[15]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[16]  Adnan Darwiche,et al.  A Variational Approach for Approximating Bayesian Networks by Edge Deletion , 2006, UAI.

[17]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[18]  Shun-ichi Amari,et al.  Stochastic Reasoning, Free Energy, and Information Geometry , 2004, Neural Computation.

[19]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[20]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[21]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[22]  Alan L. Yuille,et al.  Occlusions and binocular stereo , 1992, International Journal of Computer Vision.

[23]  Michael J. Black,et al.  Fields of Experts , 2009, International Journal of Computer Vision.

[24]  Richard Szeliski,et al.  An Analysis of the Elastic Net Approach to the Traveling Salesman Problem , 1989, Neural Computation.

[25]  Alan L. Yuille,et al.  CCCP Algorithms to Minimize the Bethe and Kikuchi Free Energies: Convergent Alternatives to Belief Propagation , 2002, Neural Computation.

[26]  Anand Rangarajan,et al.  A new point matching algorithm for non-rigid registration , 2003, Comput. Vis. Image Underst..

[27]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[28]  Michael J. Black,et al.  On the unification of line processes , 1996 .

[29]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[30]  Michael I. Jordan,et al.  The DLR Hierarchy of Approximate Inference , 2005, UAI.

[31]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[32]  Gert R. G. Lanckriet,et al.  On the Convergence of the Concave-Convex Procedure , 2009, NIPS.

[33]  Zhuowen Tu,et al.  Image Segmentation by Data-Driven Markov Chain Monte Carlo , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Hilbert J. Kappen,et al.  Approximate Inference and Constrained Optimization , 2002, UAI.

[35]  Alan L. Yuille,et al.  Statistical Physics, Mixtures of Distributions, and the EM Algorithm , 1994, Neural Computation.

[36]  Michael Isard,et al.  PAMPAS: real-valued graphical models for computer vision , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[37]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[38]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[39]  A. Yuille,et al.  Energy functions for early vision and analog networks , 1989, Biological Cybernetics.

[40]  Andrew Blake,et al.  Visual Reconstruction , 1987, Deep Learning for EEG-Based Brain–Computer Interfaces.

[41]  Michael Isard,et al.  Nonparametric belief propagation , 2010, Commun. ACM.

[42]  G. Parisi,et al.  Statistical Field Theory , 1988 .

[43]  Pedro F. Felzenszwalb,et al.  Efficient belief propagation for early vision , 2004, CVPR 2004.

[44]  Martin J. Wainwright,et al.  Tree-based reparameterization framework for analysis of sum-product and related algorithms , 2003, IEEE Trans. Inf. Theory.

[45]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[46]  C Koch,et al.  Analog "neuronal" networks in early vision. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[48]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[49]  Alan L. Yuille,et al.  A common framework for image segmentation , 1990, International Journal of Computer Vision.

[50]  Nanning Zheng,et al.  Stereo Matching Using Belief Propagation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..