A new class of upper bounds on the log partition function

We introduce a new class of upper bounds on the log partition function of a Markov random field (MRF). This quantity plays an important role in various contexts, including approximating marginal distributions, parameter estimation, combinatorial enumeration, statistical decision theory, and large-deviations bounds. Our derivation is based on concepts from convex duality and information geometry: in particular, it exploits mixtures of distributions in the exponential domain, and the Legendre mapping between exponential and mean parameters. In the special case of convex combinations of tree-structured distributions, we obtain a family of variational problems, similar to the Bethe variational problem, but distinguished by the following desirable properties: i) they are convex, and have a unique global optimum; and ii) the optimum gives an upper bound on the log partition function. This optimum is defined by stationary conditions very similar to those defining fixed points of the sum-product algorithm, or more generally, any local optimum of the Bethe variational problem. As with sum-product fixed points, the elements of the optimizing argument can be used as approximations to the marginals of the original model. The analysis extends naturally to convex combinations of hypertree-structured distributions, thereby establishing links to Kikuchi approximations and variants.

[1]  L. Onsager Crystal statistics. I. A two-dimensional model with an order-disorder transition , 1944 .

[2]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[3]  O. Barndorff-Nielsen Information And Exponential Families , 1970 .

[4]  Jack Edmonds,et al.  Matroids and the greedy algorithm , 1971, Math. Program..

[5]  N. Biggs Algebraic Graph Theory , 1974 .

[6]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[7]  Dieter Jungnickel,et al.  Graphs, Networks, and Algorithms , 1980 .

[8]  R. Baxter Exactly solved models in statistical mechanics , 1982 .

[9]  S. Amari Differential Geometry of Curved Exponential Families-Curvatures and Information Loss , 1982 .

[10]  R. Stanley What Is Enumerative Combinatorics , 1986 .

[11]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[12]  S. Chopra On the spanning tree polyhedron , 1989 .

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  James G. Oxley,et al.  Matroid theory , 1992 .

[15]  Richard M. Wilson,et al.  A course in combinatorics , 1992 .

[16]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[17]  Mark Jerrum,et al.  Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[18]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[19]  Michel Mandjes,et al.  Large Deviations for Performance Analysis: Queues, Communications, and Computing , Adam Shwartz and Alan Weiss (New York: Chapman and Hall, 1995). , 1996, Probability in the Engineering and Informational Sciences.

[20]  Michael I. Jordan,et al.  Recursive Algorithms for Approximating Probabilities in Graphical Models , 1996, NIPS.

[21]  Jun Zhang,et al.  The application of the Gibbs-Bogoliubov-Feynman inequality in mean field calculations for Markov random fields , 1996, IEEE Trans. Image Process..

[22]  Michael I. Jordan,et al.  Computing upper and lower bounds on likelihoods in intractable networks , 1996, UAI.

[23]  Gerasimos Potamianos,et al.  Stochastic approximation algorithms for partition function estimation of Gibbs random fields , 1997, IEEE Trans. Inf. Theory.

[24]  Michael I. Jordan Graphical Models , 2003 .

[25]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[26]  Nathan Srebro,et al.  Maximum likelihood Markov networks : an algorithmic approach , 2000 .

[27]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[28]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[29]  Hilbert J. Kappen,et al.  Novel iteration schemes for the Cluster Variation Method , 2001, NIPS.

[30]  Gordon F. Royle,et al.  Algebraic Graph Theory , 2001, Graduate texts in mathematics.

[31]  Hilbert J. Kappen,et al.  A Tighter Bound for Graphical Models , 2001, Neural Computation.

[32]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[33]  Martin J. Wainwright,et al.  Tree-based reparameterization for approximate inference on loopy graphs , 2001, NIPS.

[34]  David R. Karger,et al.  Learning Markov networks: maximum bounded tree-width graphs , 2001, SODA '01.

[35]  Shun-ichi Amari,et al.  Information geometry on hierarchy of probability distributions , 2001, IEEE Trans. Inf. Theory.

[36]  Martin J. Wainwright,et al.  Stochastic processes on graphs with cycles: geometric and variational approaches , 2002 .

[37]  A. Willsky Multiresolution Markov models for signal and image processing , 2002, Proc. IEEE.

[38]  Tom Heskes,et al.  Fractional Belief Propagation , 2002, NIPS.

[39]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[40]  Martin J. Wainwright,et al.  Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching , 2003, AISTATS.

[41]  Robert J. McEliece,et al.  Belief Propagation on Partially Ordered Sets , 2003, Mathematical Systems Theory in Biology, Communications, Computation, and Finance.

[42]  Hilbert J. Kappen,et al.  Approximate Inference and Constrained Optimization , 2002, UAI.

[43]  Martin J. Wainwright,et al.  Tree-based reparameterization framework for analysis of sum-product and related algorithms , 2003, IEEE Trans. Inf. Theory.

[44]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[45]  Max Welling,et al.  On the Choice of Regions for Generalized Belief Propagation , 2004, UAI.

[46]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[47]  Wim Wiegerinck Approximations with Reweighted Generalized Belief Propagation , 2005, AISTATS.