Approximate inference using conditional entropy decompositions

We introduce a novel method for estimating the partition function and marginals of distributions defined using graphical models. The method uses the entropy chain rule to obtain an upper bound on the entropy of a distribution given marginal distributions of variable subsets. The structure of the bound is determined by a permutation, or elimination order, of the model variables. Optimizing this bound results in an upper bound on the log partition function, and also yields an approximation to the model marginals. The optimization problem is convex, and is in fact a dual of a geometric program. We evaluate the method on a 2D Ising model with a wide range of parameters, and show that it compares favorably with previous methods in terms of both partition function bound, and accuracy of marginals. Graphical models are a powerful tool for representing multivariate distributions, and have been used with considerable success in numerous domains from coding algorithms to image processing. Although graphical models yield compact representations of distributions, it is often very difficult to infer simple properties of these distributions, such as the marginals over single variables, or the MAP assignment. This difficulty stems from the fact that these problems involve enumeration over an exponential number of assignments, and has motivated extensive research into approximate inference algorithms. Another problem, which turns out to have a key role in developing inference algorithms, is the calculation of the partition function. Recent works (Wainwright & Jordan, 2003; Yedidia et al., 2005) have illustrated that a variational view of partition function estimation can be used to analyze most of the previously introduced approximate inference algorithms, such as mean field, belief propagation (BP) and the tree re-weighting (TRW) framework (Wainwright et al., 2005). The above analyzes emphasize that a key ingredient in most approximate inference algorithms is the estimation of the entropy of a graphical model given marginals over subsets of its variables. This approximation may be an upper bound on the true entropy, as in the TRW framework, or one which is not guaranteed to be a bound as in the Kikuchi entropies used in Generalized Belief Propagation (GBP) (Yedidia et al., 2005). Another important property of entropy approximation is convexity. The TRW entropies are convex whereas those of GBP are not necessarily convex. In the current work, we introduce a novel upper bound on graphical model entropy, which results in a convex upper bound on the partition function. The bound is constructed by decomposing the full model entropy into a sum of conditional entropies using the entropy chain rule (Cover & Thomas, 1991), and then discarding some of the conditioning variables, thus potentially increasing the entropy. This entropy bound is then plugged into the variational formulation, resulting in a convex optimization problem that yields an upper bound on the partition function. As with previous methods (Yedidia et al., 2005; Wainwright et al., 2005), a byproduct of this optimization problem is a set of pseudo-marginals which can be used to approximate the true model marginals. We evaluate our Conditional Entropy Decomposition (CED) method on a two dimensional Ising grid, and show that it performs well for a wide range of parameters, improving on both TRW and belief propagation. 1 Definitions and Notation We shall be interested in multivariate distributions over a set of variables x = {x1, . . . , xn}. Consider a set C of subsets C ⊆ {1, . . . , n}. Denote by xC an assignment to the variables xi such that i ∈ C. A distribution over x will be parameterized using functions θ(xC). We denote by θ the vector of all parameters for C ∈ C. These can be used to define an exponential distribution over x given by

[1]  Gerhard Reinelt,et al.  On the acyclic subgraph polytope , 1985, Math. Program..

[2]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  Hilbert J. Kappen,et al.  Approximate Inference and Constrained Optimization , 2002, UAI.

[5]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[6]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[7]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[8]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.