Divergence measures and message passing

This paper presents a unifying view of messagepassing algorithms, as methods to approximate a complex Bayesian network by a simpler network with minimum information divergence. In this view, the difference between mean-field methods and belief propagation is not the amount of structure they model, but only the measure of loss they minimize (‘exclusive’ versus ‘inclusive’ Kullback-Leibler divergence). In each case, message-passing arises by minimizing a localized version of the divergence, local to each factor. By examining these divergence measures, we can intuit the types of solution they prefer (symmetry-breaking, for example) and their suitability for different tasks. Furthermore, by considering a wider variety of divergence measures (such as alpha-divergences), we can achieve different complexity and performance goals.

[1]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[2]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[3]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[4]  Huaiyu Zhu,et al.  Information geometric measurements of generalisation , 1995 .

[5]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[6]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[7]  Tommi S. Jaakkola,et al.  Tutorial on variational approximation methods , 2000 .

[8]  Brendan J. Frey,et al.  Sequentially Fitting "Inclusive" Trees for Inference in Noisy-OR Networks , 2000, NIPS.

[9]  Wim Wiegerinck,et al.  Variational Approximations between Mean Field Theory and the Junction Tree Algorithm , 2000, UAI.

[10]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[11]  Hilbert J. Kappen,et al.  Novel iteration schemes for the Cluster Variation Method , 2001, NIPS.

[12]  Thomas P. Minka,et al.  The EP energy function and minimization schemes , 2001 .

[13]  M. Opper,et al.  Comparing the Mean Field Method and Belief Propagation for Approximate Inference in MRFs , 2001 .

[14]  Tom Heskes,et al.  Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy , 2002, NIPS.

[15]  M. Trottini,et al.  A generalized predictive criterion for model selection , 2002 .

[16]  T. Heskes,et al.  Expectation propagation for approximate inference in dynamic bayesian networks , 2002, UAI 2002.

[17]  T. Heskes Stable Fixed Points of Loopy Belief Propagation Are Minima of the Bethe Free Energy , 2002 .

[18]  Tom Heskes,et al.  Fractional Belief Propagation , 2002, NIPS.

[19]  Alexander J. Smola,et al.  Laplace Propagation , 2003, NIPS.

[20]  Yuan Qi,et al.  Tree-structured Approximations by Expectation Propagation , 2003, NIPS.

[21]  Hilbert J. Kappen,et al.  Bound Propagation , 2003, J. Artif. Intell. Res..

[22]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[23]  T. Minka Power EP , 2004 .

[24]  Hilbert J. Kappen,et al.  Validity Estimates for Loopy Belief Propagation on Binary Real-world Networks , 2004, NIPS.

[25]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[26]  Yee Whye Teh,et al.  Structured Region Graphs: Morphing EP into GBP , 2005, UAI.

[27]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[28]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[29]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[30]  Martin J. Wainwright,et al.  MAP estimation via agreement on (hyper)trees: Message-passing and linear programming , 2005, ArXiv.