Visualizing and understanding Sum-Product Networks

Sum-Product Networks (SPNs) are deep tractable probabilistic models by which several kinds of inference queries can be answered exactly and in a tractable time. They have been largely used as black box density estimators, assessed by comparing their likelihood scores on different tasks. In this paper we explore and exploit the inner representations learned by SPNs. By taking a closer look at the inner workings of SPNs, we aim to better understand what and how meaningful the representations they learn are, as in a classic Representation Learning framework. We firstly propose an interpretation of SPNs as Multi-Layer Perceptrons, we then devise several criteria to extract representations from SPNs and finally we empirically evaluate them in several (semi-)supervised tasks showing they are competitive against classical feature extractors like RBMs, DBNs and deep probabilistic autoencoders, like MADEs and VAEs.

[1]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[2]  Yoshua Bengio,et al.  Tractable Multivariate Binary Density Estimation and the Restricted Boltzmann Forest , 2010, Neural Computation.

[3]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[4]  Denis Deratani Mauá,et al.  Approximation Complexity of Maximum A Posteriori Inference in Sum-Product Networks , 2017, UAI.

[5]  Adnan Darwiche,et al.  A differential approach to inference in Bayesian networks , 2000, JACM.

[6]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[7]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[8]  Pedro M. Domingos,et al.  Learning Selective Sum-Product Networks , 2014 .

[9]  Wei-Chen Cheng,et al.  Language modeling with sum-product networks , 2014, INTERSPEECH.

[10]  Dan Ventura,et al.  Learning the Architecture of Sum-Product Networks Using Clustering on Variables , 2012, NIPS.

[11]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[12]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[13]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[14]  Franz Pernkopf,et al.  On the Latent Variable Interpretation in Sum-Product Networks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Han Zhao,et al.  On the Relationship between Sum-Product Networks and Bayesian Networks , 2015, ICML.

[18]  Franz Pernkopf,et al.  Greedy Part-Wise Learning of Sum-Product Networks , 2013, ECML/PKDD.

[19]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[20]  Pedro M. Domingos,et al.  Discriminative Learning of Sum-Product Networks , 2012, NIPS.

[21]  Nando de Freitas,et al.  Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[22]  Yoshua Bengio,et al.  Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[23]  Vibhav Gogate,et al.  Merging Strategies for Sum-Product Networks: From Trees to Graphs , 2016, UAI.

[24]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[25]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[26]  Han Zhao,et al.  Online and Distributed Bayesian Moment Matching for Parameter Learning in Sum-Product Networks , 2016, AISTATS.

[27]  Kristian Kersting,et al.  Poisson Sum-Product Networks: A Deep Architecture for Tractable Multivariate Poisson Distributions , 2017, AAAI.

[28]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[29]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[30]  Floriana Esposito,et al.  Simplifying, Regularizing and Strengthening Sum-Product Network Structure Learning , 2015, ECML/PKDD.

[31]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[32]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[33]  Ali Ghodsi,et al.  Learning the Structure of Sum-Product Networks via an SVD-based Algorithm , 2015, UAI.

[34]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[35]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[36]  Daniel Lowd,et al.  Learning Sum-Product Networks with Direct and Indirect Variable Interactions , 2014, ICML.

[37]  James Martens,et al.  On the Expressive Efficiency of Sum Product Networks , 2014, ArXiv.

[38]  Pascal Vincent,et al.  Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives , 2012, ArXiv.

[39]  Franz Pernkopf,et al.  Modeling speech with sum-product networks: Application to bandwidth extension , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[41]  Mohamed R. Amer,et al.  Sum Product Networks for Activity Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[43]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[44]  A. Jefferson Offutt,et al.  An Empirical Evaluation , 1994 .

[45]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[46]  Sebastian Tschiatschek,et al.  Sum-Product Networks for Structured Prediction: Context-Specific Deep Conditional Random Fields , 2014 .

[47]  Han Zhao,et al.  Collapsed Variational Inference for Sum-Product Networks , 2016, ICML.

[48]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[49]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[50]  Mohamed R. Amer,et al.  Sum-product networks for modeling activities with stochastic structure , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[52]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[53]  Adnan Darwiche,et al.  On Relaxing Determinism in Arithmetic Circuits , 2017, ICML.

[54]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[56]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[57]  N. Goodwin,et al.  Learning to Detect Objects in Images via a Sparse, Part-Based Representation , 2004 .

[58]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[59]  Pedro M. Domingos,et al.  Learning Tractable Probabilistic Models for Fault Localization , 2015, AAAI.

[60]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[61]  Daniel Lowd,et al.  The Libra toolkit for probabilistic models , 2015, J. Mach. Learn. Res..

[62]  Jude W. Shavlik,et al.  Visualizing Learning and Computation in Artificial Neural Networks , 1992, Int. J. Artif. Intell. Tools.

[63]  Pascal Poupart,et al.  Online Structure Learning for Sum-Product Networks with Gaussian Leaves , 2017, ICLR.

[64]  Sebastian Tschiatschek,et al.  On Theoretical Properties of Sum-Product Networks , 2015, AISTATS.

[65]  Pedro M. Domingos,et al.  Learning the Structure of Sum-Product Networks , 2013, ICML.

[66]  Dan Ventura,et al.  Greedy Structure Search for Sum-Product Networks , 2015, IJCAI.

[67]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[68]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[69]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..