Probabilistic Circuits for Variational Inference in Discrete Graphical Models

Inference in discrete graphical models with variational methods is difficult because of the inability to re-parameterize gradients of the Evidence Lower Bound (ELBO). Many sampling-based methods have been proposed for estimating these gradients, but they suffer from high bias or variance. In this paper, we propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN), to compute ELBO gradients exactly (without sampling) for a certain class of densities. In particular, we show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is a polynomial the corresponding ELBO can be computed analytically. To scale to graphical models with thousands of variables, we develop an efficient and effective construction of selective-SPNs with size $O(kn)$, where $n$ is the number of variables and $k$ is an adjustable hyperparameter. We demonstrate our approach on three types of graphical models -- Ising models, Latent Dirichlet Allocation, and factor graphs from the UAI Inference Competition. Selective-SPNs give a better lower bound than mean-field and structured mean-field, and is competitive with approximations that do not provide a lower bound, such as Loopy Belief Propagation and Tree-Reweighted Belief Propagation. Our results show that probabilistic circuits are promising tools for variational inference in discrete graphical models as they combine tractability and expressivity.

[1]  Stefano Ermon,et al.  Neural Variational Inference and Learning in Undirected Graphical Models , 2017, NIPS.

[2]  Franz Pernkopf,et al.  On the Latent Variable Interpretation in Sum-Product Networks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Beate Bollig,et al.  On the Relative Succinctness of Sentential Decision Diagrams , 2018, Theory of Computing Systems.

[4]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[5]  Guy Van den Broeck,et al.  Learning the Structure of Probabilistic Sentential Decision Diagrams , 2017, UAI.

[6]  Jascha Sohl-Dickstein,et al.  REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[7]  Guy Van den Broeck,et al.  On Tractable Computation of Expected Predictions , 2019, NeurIPS.

[8]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[9]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[10]  Pedro M. Domingos,et al.  Approximate Inference by Compilation to Arithmetic Circuits , 2010, NIPS.

[11]  Guy Van den Broeck,et al.  What to Expect of Classifiers? Reasoning about Logistic Regression with Missing Features , 2019, IJCAI.

[12]  Sam Wiseman,et al.  Amortized Bethe Free Energy Minimization for Learning MRFs , 2019, NeurIPS.

[13]  Dustin Tran,et al.  Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language , 2018, NeurIPS.

[14]  Adnan Darwiche,et al.  Tractable Operations for Arithmetic Circuits of Probabilistic Models , 2016, NIPS.

[15]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[16]  Pierre Marquis,et al.  A Knowledge Compilation Map , 2002, J. Artif. Intell. Res..

[17]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[18]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[19]  Ben Poole,et al.  Categorical Reparametrization with Gumble-Softmax , 2017, ICLR 2017.

[20]  Guy Van den Broeck,et al.  Probabilistic Sentential Decision Diagrams , 2014, KR.

[21]  Guy Van den Broeck,et al.  Probabilistic Circuits: A Unifying Framework for Tractable Probabilistic Models∗ , 2020 .

[22]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[23]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[24]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[25]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[26]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[27]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[28]  Melih Kandemir,et al.  Variational closed-Form deep neural net inference , 2018, Pattern Recognit. Lett..

[29]  Melih Kandemir,et al.  Sampling-Free Variational Inference of Bayesian Neural Networks by Variance Backpropagation , 2018, UAI.

[30]  Andriy Mnih,et al.  Variational Inference for Monte Carlo Objectives , 2016, ICML.

[31]  Pedro M. Domingos,et al.  Learning the Structure of Sum-Product Networks , 2013, ICML.

[32]  Adnan Darwiche,et al.  Tractability in Structured Probability Spaces , 2017, NIPS.

[33]  Pedro M. Domingos,et al.  The Sum-Product Theorem: A Foundation for Learning Tractable Models , 2016, ICML.

[34]  Adnan Darwiche,et al.  A differential approach to inference in Bayesian networks , 2000, JACM.

[35]  Arash Vahdat,et al.  DVAE++: Discrete Variational Autoencoders with Overlapping Transformations , 2018, ICML.

[36]  M MooijJoris libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010 .

[37]  Kristian Kersting,et al.  Random Sum-Product Networks: A Simple and Effective Approach to Probabilistic Deep Learning , 2019, UAI.

[38]  T. Sanders,et al.  Analysis of Boolean Functions , 2012, ArXiv.

[39]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[40]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[41]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[42]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[43]  Floriana Esposito,et al.  Visualizing and understanding Sum-Product Networks , 2016, Machine Learning.

[44]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[45]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[46]  Pedro M. Domingos,et al.  Learning Selective Sum-Product Networks , 2014 .

[47]  Carl E. Rasmussen,et al.  Deep Structured Mixtures of Gaussian Processes , 2019, AISTATS.