Efficient Computation of Moments in Sum-Product Networks

Bayesian online learning algorithms for Sum-Product Networks (SPNs) need to compute moments of model parameters under the one-step update posterior distribution. The best existing method for computing such moments scales quadratically in the size of the SPN, although it scales linearly for trees. We propose a linear-time algorithm that works even when the SPN is a directed acyclic graph (DAG). We achieve this goal by reducing the moment computation problem into a joint inference problem in SPNs and by taking advantage of a special structure of the one-step update posterior distribution: it is a multilinear polynomial with exponentially many monomials, and we can evaluate moments by differentiating. The latter is known as the \emph{differential trick}. We apply the proposed algorithm to develop a linear time assumed density filter (ADF) for SPN parameter learning. As an additional contribution, we conduct extensive experiments comparing seven different online learning algorithms for SPNs on 20 benchmark datasets. The new linear-time ADF method consistently achieves low runtime due to the efficient linear-time algorithm for moment computation; however, we discover that two other methods (CCCP and SMA) typically perform better statistically, while a third (BMM) is comparable to ADF. Interestingly, CCCP can be viewed as implicitly using the same differentiation trick that we make explicit here. The fact that two of the top four fastest methods use this trick suggests that the same trick might find other uses for SPN learning in the future.

[1]  Adnan Darwiche,et al.  A differential approach to inference in Bayesian networks , 2000, JACM.

[2]  Han Zhao,et al.  Online and Distributed Bayesian Moment Matching for Parameter Learning in Sum-Product Networks , 2016, AISTATS.

[3]  Vibhav Gogate,et al.  Merging Strategies for Sum-Product Networks: From Trees to Graphs , 2016, UAI.

[4]  Sebastian Tschiatschek,et al.  On Theoretical Properties of Sum-Product Networks , 2015, AISTATS.

[5]  Pedro M. Domingos,et al.  Learning the Structure of Sum-Product Networks , 2013, ICML.

[6]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[7]  Franz Pernkopf,et al.  On the Latent Variable Interpretation in Sum-Product Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Imre Csiszár,et al.  Information projections revisited , 2000, IEEE Trans. Inf. Theory.

[9]  Han Zhao,et al.  Collapsed Variational Inference for Sum-Product Networks , 2016, ICML.

[10]  Han Zhao,et al.  On the Relationship between Sum-Product Networks and Bayesian Networks , 2015, ICML.

[11]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[12]  Richard E. Turner,et al.  Stochastic Expectation Propagation , 2015, NIPS.

[13]  Daniel Lowd,et al.  Learning Sum-Product Networks with Direct and Indirect Variable Interactions , 2014, ICML.

[14]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[15]  Zhitang Chen,et al.  Online Algorithms for Sum-Product Networks with Continuous Variables , 2016, Probabilistic Graphical Models.

[16]  Dan Ventura,et al.  Greedy Structure Search for Sum-Product Networks , 2015, IJCAI.

[17]  Floriana Esposito,et al.  Simplifying, Regularizing and Strengthening Sum-Product Network Structure Learning , 2015, ECML/PKDD.

[18]  H. Sorenson,et al.  NONLINEAR FILTERING BY APPROXIMATION OF THE A POSTERIORI DENSITY , 1968 .

[19]  Pedro M. Domingos,et al.  Discriminative Learning of Sum-Product Networks , 2012, NIPS.

[20]  Ali Ghodsi,et al.  Learning the Structure of Sum-Product Networks via an SVD-based Algorithm , 2015, UAI.

[21]  Adnan Darwiche,et al.  A differential semantics for jointree algorithms , 2002, Artif. Intell..

[22]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.