Tree-based reparameterization framework for analysis of sum-product and related algorithms

We present a tree-based reparameterization (TRP) framework that provides a new conceptual view of a large class of algorithms for computing approximate marginals in graphs with cycles. This class includes the belief propagation (BP) or sum-product algorithm as well as variations and extensions of BP. Algorithms in this class can be formulated as a sequence of reparameterization updates, each of which entails refactorizing a portion of the distribution corresponding to an acyclic subgraph (i.e., a tree, or more generally, a hypertree). The ultimate goal is to obtain an alternative but equivalent factorization using functions that represent (exact or approximate) marginal distributions on cliques of the graph. Our framework highlights an important property of the sum-product algorithm and the larger class of reparameterization algorithms: the original distribution on the graph with cycles is not changed. The perspective of tree-based updates gives rise to a simple and intuitive characterization of the fixed points in terms of tree consistency. We develop interpretations of these results in terms of information geometry. The invariance of the distribution, in conjunction with the fixed-point characterization, enables us to derive an exact expression for the difference between the true marginals on an arbitrary graph with cycles, and the approximations provided by belief propagation. More broadly, our analysis applies to any algorithm that minimizes the Bethe free energy. We also develop bounds on the approximation error, which illuminate the conditions that govern their accuracy. Finally, we show how the reparameterization perspective extends naturally to generalizations of BP (e.g., Kikuchi (1951) approximations and variants) via the notion of hypertree reparameterization.

[1]  R. Kikuchi A Theory of Cooperative Phenomena , 1951 .

[2]  G. Fournet Theory of Cooperative Phenomena , 1952 .

[3]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[4]  O. Barndorff-Nielsen Information And Exponential Families , 1970 .

[5]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[6]  G. Grimmett A THEOREM ABOUT RANDOM FIELDS , 1973 .

[7]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[8]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[9]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[10]  Dieter Jungnickel,et al.  Graphs, Networks, and Algorithms , 1980 .

[11]  N. N. Chent︠s︡ov Statistical decision rules and optimal inference , 1982 .

[12]  R. Baxter Exactly solved models in statistical mechanics , 1982 .

[13]  S. Amari Differential Geometry of Curved Exponential Families-Curvatures and Information Loss , 1982 .

[14]  Shun-ichi Amari,et al.  Differential geometry of statistical inference , 1983 .

[15]  R. Stanley What Is Enumerative Combinatorics , 1986 .

[16]  甘利 俊一 Differential geometry in statistical inference , 1987 .

[17]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[18]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[19]  R. Baxter Solving Models in Statistical Mechanics , 1989 .

[20]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[21]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[22]  Michèle Basseville,et al.  Modeling and estimation of multiresolution stochastic processes , 1992, IEEE Trans. Inf. Theory.

[23]  Shun-ichi Amari,et al.  Information geometry of Boltzmann machines , 1992, IEEE Trans. Neural Networks.

[24]  Kazuyuki Tanaka,et al.  Cluster variation method and image restoration problem , 1995 .

[25]  Peter C. Doerschuk,et al.  Tree Approximations to Markov Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Michael I. Jordan,et al.  Recursive Algorithms for Approximating Probabilities in Graphical Models , 1996, NIPS.

[27]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[28]  Y. Censor,et al.  Parallel Optimization:theory , 1997 .

[29]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[30]  John B. Anderson,et al.  Tailbiting MAP Decoders , 1998, IEEE J. Sel. Areas Commun..

[31]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[32]  Brendan J. Frey,et al.  Iterative Decoding of Compound Codes by Probability Propagation in Graphical Models , 1998, IEEE J. Sel. Areas Commun..

[33]  R.J. McEliece,et al.  Iterative decoding on graphs with a single cycle , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[34]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[35]  Benjamin Van Roy,et al.  An Analysis of Turbo Decoding with Gaussian Densities , 1999, NIPS.

[36]  Robert J. McEliece,et al.  The generalized distributive law , 2000, IEEE Trans. Inf. Theory.

[37]  Yair Weiss,et al.  Correctness of Local Probability Propagation in Graphical Models with Loops , 2000, Neural Computation.

[38]  Martin J. Wainwright,et al.  Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles , 2000, NIPS.

[39]  Thomas J. Richardson,et al.  The geometry of turbo-decoding dynamics , 2000, IEEE Trans. Inf. Theory.

[40]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[41]  N. Čencov Statistical Decision Rules and Optimal Inference , 2000 .

[42]  William T. Freeman,et al.  Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology , 1999, Neural Computation.

[43]  A. Yuille A Double-Loop Algorithm to Minimize the Bethe and Kikuchi Free Energies , 2001 .

[44]  Yee Whye Teh,et al.  Belief Optimization for Binary Networks: A Stable Alternative to Loopy Belief Propagation , 2001, UAI.

[45]  Gordon F. Royle,et al.  Algebraic Graph Theory , 2001, Graduate texts in mathematics.

[46]  Hilbert J. Kappen,et al.  A Tighter Bound for Graphical Models , 2001, Neural Computation.

[47]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[48]  G. Forney,et al.  Iterative Decoding of Tail-Biting Trellises and Connections with Symbolic Dynamics , 2001 .

[49]  Benjamin Van Roy,et al.  An analysis of belief propagation on the turbo decoding graph with Gaussian densities , 2001, IEEE Trans. Inf. Theory.

[50]  Sekhar Tatikonda,et al.  Loopy Belief Propogation and Gibbs Measures , 2002, UAI.

[51]  Martin J. Wainwright,et al.  Stochastic processes on graphs with cycles: geometric and variational approaches , 2002 .

[52]  Payam Pakzad,et al.  Belief Propagation and Statistical Physics , 2002 .

[53]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[54]  Robert J. McEliece,et al.  Belief Propagation on Partially Ordered Sets , 2003, Mathematical Systems Theory in Biology, Communications, Computation, and Finance.

[55]  Martin J. Wainwright,et al.  Tree consistency and bounds on the performance of the max-product algorithm and its generalizations , 2004, Stat. Comput..

[56]  William T. Freeman,et al.  Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[57]  Martin J. Wainwright,et al.  Embedded trees: estimation of Gaussian Processes on graphs with cycles , 2004, IEEE Transactions on Signal Processing.

[58]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[59]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[60]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .