Exact and approximate inference in graphical models: variable elimination and beyond

Probabilistic graphical models offer a powerful framework to account for the dependence structure between variables, which can be represented as a graph. The dependence between variables may render inference tasks such as computing normalizing constant, marginalization or optimization intractable. The objective of this paper is to review techniques exploiting the graph structure for exact inference borrowed from optimization and computer science. They are not yet standard in the statistician toolkit, and we specify under which conditions they are efficient in practice. They are built on the principle of variable elimination whose complexity is dictated in an intricate way by the order in which variables are eliminated in the graph. The so-called treewidth of the graph characterizes this algorithmic complexity: low-treewidth graphs can be processed efficiently. Algorithmic solutions derived from variable elimination and the notion of treewidth are illustrated on problems of treewidth computation and inference in challenging benchmarks from optimization competitions. We also review how efficient techniques for approximate inference such as loopy belief propagation and variational approaches can be linked to variable elimination and we illustrate them in the context of Expectation-Maximisation procedures for parameter estimation in coupled Hidden Markov Models.

[1]  Yair Weiss,et al.  Correctness of Local Probability Propagation in Graphical Models with Loops , 2000, Neural Computation.

[2]  Hans L. Bodlaender,et al.  A Partial k-Arboretum of Graphs with Bounded Treewidth , 1998, Theor. Comput. Sci..

[3]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[4]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[5]  Jürg Kohlas Information algebras - generic structures for inference , 2003, Discrete mathematics and theoretical computer science.

[6]  Martin S. Andersen,et al.  Chordal Graphs and Semidefinite Optimization , 2015, Found. Trends Optim..

[7]  John R. Gilbert,et al.  Approximating Treewidth, Pathwidth, Frontsize, and Shortest Elimination Tree , 1995, J. Algorithms.

[8]  Qiang Liu,et al.  Bounding the Partition Function using Holder's Inequality , 2011, ICML.

[9]  Anthony N. Pettitt,et al.  Efficient recursions for general factorisable models , 2004 .

[10]  Umberto Bertelè,et al.  Nonserial Dynamic Programming , 1972 .

[11]  Hans L. Bodlaender,et al.  A Tourist Guide through Treewidth , 1993, Acta Cybern..

[12]  Prakash P. Shenoy,et al.  Axioms for probability and belief-function proagation , 1990, UAI.

[13]  Mace G. Barron,et al.  A Practical Probabilistic Graphical Modeling Tool for Weighing Ecological Risk-Based Evidence , 2016 .

[14]  Robert E. Tarjan,et al.  Simple Linear-Time Algorithms to Test Chordality of Graphs, Test Acyclicity of Hypergraphs, and Selectively Reduce Acyclic Hypergraphs , 1984, SIAM J. Comput..

[15]  Francesca Rossi,et al.  Semiring-based constraint solving and optimization , 1997 .

[16]  R. Duffin Topology of series-parallel networks , 1965 .

[17]  Michal Pilipczuk,et al.  A ck n 5-Approximation Algorithm for Treewidth , 2016, SIAM J. Comput..

[18]  Tomás Werner,et al.  High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft constraint optimisation (MAP-MRF) , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Derek G. Corneil,et al.  Complexity of finding embeddings in a k -tree , 1987 .

[20]  Pinar Heggernes,et al.  The Computational Complexity of the Minimum Degree Algorithm , 2001 .

[21]  Hyun-Chul Kim,et al.  Bayesian Gaussian Process Classification with the EM-EP Algorithm , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Thomas Schiex,et al.  An Algebraic Graphical Model for Decision with Uncertainties, Feasibilities, and Utilities , 2007, J. Artif. Intell. Res..

[23]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[24]  Tom Heskes,et al.  Approximate Expectation Maximization , 2003, NIPS.

[25]  R. Dechter,et al.  Winning the PASCAL 2011 MAP Challenge with Enhanced AND / OR Branch-and-Bound , 2011 .

[26]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[27]  Martin C. Cooper,et al.  Soft arc consistency revisited , 2010, Artif. Intell..

[28]  Bruce A. Reed,et al.  An Improved Algorithm for Finding Tree Decompositions of Small Width , 1999, WG.

[29]  Joseph W. H. Liu,et al.  The Multifrontal Method for Sparse Matrix Solution: Theory and Practice , 1992, SIAM Rev..

[30]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[31]  Arie M. C. A. Koster,et al.  Treewidth computations I. Upper bounds , 2010, Inf. Comput..

[32]  Toby P. Breckon,et al.  Fundamentals of Digital Image Processing: A Practical Approach with Examples in Matlab , 2011 .

[33]  David L. Waltz,et al.  Generating Semantic Descriptions From Drawings of Scenes With Shadows , 1972 .

[34]  Rina Dechter,et al.  Network-Based Heuristics for Constraint-Satisfaction Problems , 1987, Artif. Intell..

[35]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[36]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[37]  Daphne Koller,et al.  Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing , 2013, ICML.

[38]  E. L. Lawler,et al.  Branch-and-Bound Methods: A Survey , 1966, Oper. Res..

[39]  Bart Selman,et al.  Low-density Parity Constraints for Hashing-Based Discrete Integration , 2014, ICML.

[40]  Stuart J. Russell,et al.  Probabilistic graphical models and algorithms for genomic analysis , 2004 .

[41]  Eyal Amir,et al.  Approximation Algorithms for Treewidth , 2010, Algorithmica.

[42]  Hans L. Bodlaender A linear time algorithm for finding tree-decompositions of small treewidth , 1993, STOC '93.

[43]  Rina Dechter,et al.  Bucket Elimination: A Unifying Framework for Reasoning , 1999, Artif. Intell..

[44]  M. Wand,et al.  Theory of Gaussian variational approximation for a Poisson mixed model , 2011 .

[45]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[46]  Bo Wang,et al.  Inadequacy of interval estimates corresponding to variational Bayesian approximations , 2005, AISTATS.

[47]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[48]  J. Daudin,et al.  Accuracy of variational estimates for random graph mixture models , 2012 .

[49]  Cesare Tinelli,et al.  Handbook of Satisfiability , 2021, Handbook of Satisfiability.

[50]  Donald M. Topkis,et al.  Minimizing a Submodular Function on a Lattice , 1978, Oper. Res..

[51]  Andrew Gelfand,et al.  Pushing the Power of Stochastic Greedy Ordering Schemes for Inference in Graphical Models , 2011, AAAI.

[52]  D. R. Fulkerson,et al.  Incidence matrices and interval graphs , 1965 .

[53]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[54]  Hyungwon Choi,et al.  Sparsely correlated hidden Markov models with application to genome-wide location studies , 2013, Bioinform..

[55]  Joydeep Ghosh,et al.  HMMs and Coupled HMMs for multi-channel EEG classification , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[56]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[58]  James R. Glass,et al.  Developments and directions in speech recognition and understanding, Part 1 [DSP Education] , 2009, IEEE Signal Processing Magazine.

[59]  Alain Celisse,et al.  Consistency of maximum-likelihood and variational estimators in the Stochastic Block Model , 2011, 1105.3288.

[60]  Jimmy Ho-man Lee,et al.  Towards efficient consistency enforcement for global constraints in weighted constraint satisfaction , 2009, IJCAI 2009.

[61]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[63]  Hilary Putnam,et al.  A Computing Procedure for Quantification Theory , 1960, JACM.

[64]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[65]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[66]  Rina Dechter,et al.  Memory Intensive Branch-and-Bound Search for Graphical Models , 2006, AAAI.

[67]  S. D. Givry,et al.  Decomposing Global Cost Functions , 2011, CP 2011.

[68]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[69]  Mari Ostendorf,et al.  Parameter reduction schemes for loosely coupled HMMs , 2003, Comput. Speech Lang..

[70]  Janne H. Korhonen,et al.  Exact Learning of Bounded Tree-width Bayesian Networks , 2013, AISTATS.

[71]  Javier Larrosa,et al.  Boosting Search with Variable Elimination , 2000, CP.

[72]  Toby Walsh,et al.  Handbook of Constraint Programming , 2006, Handbook of Constraint Programming.

[73]  Thomas Schiex,et al.  Guaranteed Weighted Counting for Affinity Computation: Beyond Determinism and Structure , 2016, CP.

[74]  Martin C. Cooper Cyclic consistency: A local reduction operation for binary valued constraints , 2004, Artif. Intell..

[75]  Hilbert J. Kappen,et al.  Sufficient Conditions for Convergence of the Sum–Product Algorithm , 2005, IEEE Transactions on Information Theory.

[76]  Fedor V. Fomin,et al.  Treewidth computation and extremal combinatorics , 2008, Comb..

[77]  K. Wagner,et al.  Graph Minor Theory , 2005 .

[78]  Patrick R. Amestoy,et al.  An Approximate Minimum Degree Ordering Algorithm , 1996, SIAM J. Matrix Anal. Appl..

[79]  Stefan Arnborg,et al.  Efficient algorithms for combinatorial problems on graphs with bounded decomposability — A survey , 1985, BIT.

[80]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[81]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[82]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[83]  Bonnie Berger,et al.  Graph algorithms for biological systems analysis , 2008, SODA '08.

[84]  Martin J. Wainwright,et al.  Belief propagation for continuous state spaces: stochastic message-passing with quantitative guarantees , 2012, J. Mach. Learn. Res..

[85]  Jörg H. Kappes,et al.  OpenGM: A C++ Library for Discrete Graphical Models , 2012, ArXiv.

[86]  Tatsuya Akutsu,et al.  Completing Networks Using Observed Data , 2009, ALT.

[87]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[88]  Vladimir Kolmogorov,et al.  Graph cut based image segmentation with connectivity priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[89]  Justin M. J. Travis,et al.  Fitting complex ecological point process models with integrated nested Laplace approximation , 2013 .

[90]  J. Christopher Beck Principles and Practice of Constraint Programming , 2017, Lecture Notes in Computer Science.

[91]  Toniann Pitassi,et al.  Inapproximability of Treewidth, One-Shot Pebbling, and Related Layout Problems , 2011, APPROX-RANDOM.

[92]  Arie M. C. A. Koster,et al.  PREPROCESSING RULES FOR TRIANGULATION OF PROBABILISTIC NETWORKS * , 2005, Comput. Intell..

[93]  D. Titterington,et al.  Convergence properties of a general algorithm for calculating variational Bayesian estimates for a normal mixture model , 2006 .

[94]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[95]  Thomas Schiex,et al.  Valued Constraint Satisfaction Problems: Hard and Easy Problems , 1995, IJCAI.

[96]  Ardavan Saeedi,et al.  Variational Particle Approximations , 2014, J. Mach. Learn. Res..

[97]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[98]  Kristian G. Olesen,et al.  An algebra of bayesian belief universes for knowledge-based systems , 1990, Networks.

[99]  Paul D. Seymour,et al.  Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[100]  Simon de Givry,et al.  Exploiting Tree Decomposition and Soft Local Consistency In Weighted CSP , 2006, AAAI.

[101]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[102]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[103]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[104]  Michael I. Jordan Graphical Models , 1998 .

[105]  Joris M. Mooij,et al.  libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010, J. Mach. Learn. Res..

[106]  Michel Minoux,et al.  Graphs, dioids and semirings : new models and algorithms , 2008 .

[107]  Subhransu Maji,et al.  On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations , 2013, NIPS.

[108]  Anthony N. Pettitt,et al.  Bayesian Inference in Hidden Markov Random Fields for Binary Data Defined on Large Lattices , 2009 .

[109]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[110]  Martin C. Cooper,et al.  A Maximal Tractable Class of Soft Constraints , 2003, IJCAI.

[111]  Francis R. Bach,et al.  Convex Relaxations for Learning Bounded-Treewidth Decomposable Graphs , 2012, ICML.

[112]  Rina Dechter,et al.  Principles and Practice of Constraint Programming – CP 2000 , 2001, Lecture Notes in Computer Science.

[113]  Peter Bühlmann,et al.  Predicting causal effects in large-scale systems from observational data , 2010, Nature Methods.

[114]  R. Kikuchi A Theory of Cooperative Phenomena , 1951 .

[115]  Michel Habib,et al.  On some simplicial elimination schemes for chordal graphs , 2009, Electron. Notes Discret. Math..

[116]  Brandon M. Malone,et al.  Learning Optimal Bounded Treewidth Bayesian Networks via Maximum Satisfiability , 2014, AISTATS.

[117]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[118]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[119]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[120]  Michal Pilipczuk,et al.  An O(c^k n) 5-Approximation Algorithm for Treewidth , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.