On the Sample Complexity of Learning Sum-Product Networks

Sum-Product Networks (SPNs) can be regarded as a form of deep graphical models that compactly represent deeply factored and mixed distributions. An SPN is a rooted directed acyclic graph (DAG) consisting of a set of leaves (corresponding to base distributions), a set of sum nodes (which represent mixtures of their children distributions) and a set of product nodes (representing the products of its children distributions). In this work, we initiate the study of the sample complexity of PAC-learning the set of distributions that correspond to SPNs. We show that the sample complexity of learning tree structured SPNs with the usual type of leaves (i.e., Gaussian or discrete) grows at most linearly (up to logarithmic factors) with the number of parameters of the SPN. More specifically, we show that the class of distributions that corresponds to tree structured Gaussian SPNs with $k$ mixing weights and $e$ ($d$-dimensional Gaussian) leaves can be learned within Total Variation error $\epsilon$ using at most $\widetilde{O}(\frac{ed^2+k}{\epsilon^2})$ samples. A similar result holds for tree structured SPNs with discrete leaves. We obtain the upper bounds based on the recently proposed notion of distribution compression schemes. More specifically, we show that if a (base) class of distributions $\mathcal{F}$ admits an "efficient" compression, then the class of tree structured SPNs with leaves from $\mathcal{F}$ also admits an efficient compression.

[1]  Adnan Darwiche,et al.  A differential approach to inference in Bayesian networks , 2000, JACM.

[2]  Kristian Kersting,et al.  Random Sum-Product Networks: A Simple and Effective Approach to Probabilistic Deep Learning , 2019, UAI.

[3]  Zhitang Chen,et al.  Discriminative Training of Sum-Product Networks by Extended Baum-Welch , 2018, PGM.

[4]  Vaishak Belle,et al.  Tractable Querying and Learning in Hybrid Domains via Sum-Product Networks , 2018, ArXiv.

[5]  Zhitang Chen,et al.  Online Algorithms for Sum-Product Networks with Continuous Variables , 2016, Probabilistic Graphical Models.

[6]  Vibhav Gogate,et al.  Merging Strategies for Sum-Product Networks: From Trees to Graphs , 2016, UAI.

[7]  Franz Pernkopf,et al.  Bayesian Learning of Sum-Product Networks , 2019, NeurIPS.

[8]  Pascal Poupart,et al.  Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks , 2018, NeurIPS.

[9]  Rocco A. Servedio,et al.  Explorer Efficient Density Estimation via Piecewise Polynomial Approximation , 2013 .

[10]  Pascal Poupart,et al.  Prometheus : Directly Learning Acyclic Directed Graph Structures for Sum-Product Networks , 2018, PGM.

[11]  Dan Ventura,et al.  Greedy Structure Search for Sum-Product Networks , 2015, IJCAI.

[12]  Daniel M. Kane,et al.  Efficient Robust Proper Learning of Log-concave Distributions , 2016, ArXiv.

[13]  C. Schnörr,et al.  Learning Arbitrary Sum-Product Network Leaves with Expectation-Maximization , 2016 .

[14]  Pascal Poupart,et al.  Online Structure Learning for Sum-Product Networks with Gaussian Leaves , 2017, ICLR.

[15]  Pedro M. Domingos,et al.  Learning the Structure of Sum-Product Networks , 2013, ICML.

[16]  Floriana Esposito,et al.  Simplifying, Regularizing and Strengthening Sum-Product Network Structure Learning , 2015, ECML/PKDD.

[17]  Alon Orlitsky,et al.  Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures , 2014, NIPS.

[18]  L. Devroye A Course in Density Estimation , 1987 .

[19]  Han Zhao,et al.  Online and Distributed Bayesian Moment Matching for Parameter Learning in Sum-Product Networks , 2016, AISTATS.

[20]  Shai Ben-David,et al.  Sample-Efficient Learning of Mixtures , 2017, AAAI.

[21]  Pedro M. Domingos,et al.  Learning Selective Sum-Product Networks , 2014 .

[22]  Ali Ghodsi,et al.  Learning the Structure of Sum-Product Networks via an SVD-based Algorithm , 2015, UAI.

[23]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[24]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[25]  Franz Pernkopf,et al.  Greedy Part-Wise Learning of Sum-Product Networks , 2013, ECML/PKDD.

[26]  Constantinos Daskalakis,et al.  Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians , 2013, COLT.

[27]  Rocco A. Servedio,et al.  Learning mixtures of structured distributions over discrete domains , 2012, SODA.

[28]  Pedro M. Domingos,et al.  Discriminative Learning of Sum-Product Networks , 2012, NIPS.

[29]  Yaoliang Yu,et al.  Deep Homogeneous Mixture Models: Representation, Separation, and Approximation , 2018, NeurIPS.

[30]  Dan Ventura,et al.  Online Structure-Search for Sum-Product Networks , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[31]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[32]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[33]  Pascal Poupart,et al.  A Unified Approach for Learning the Parameters of Sum-Product Networks , 2016, NIPS.

[34]  Dan Ventura,et al.  Learning the Architecture of Sum-Product Networks Using Clustering on Variables , 2012, NIPS.

[35]  R. Trappl,et al.  Structure Inference in Sum-Product Networks using Infinite Sum-Product Trees , 2016 .

[36]  Shai Ben-David,et al.  Nearly tight sample complexity bounds for learning mixtures of Gaussians via sample compression schemes , 2018, NeurIPS.

[37]  Ilias Diakonikolas,et al.  Learning Structured Distributions , 2016, Handbook of Big Data.

[38]  Yoshua Bengio,et al.  Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[39]  James Martens,et al.  On the Expressive Efficiency of Sum Product Networks , 2014, ArXiv.

[40]  Byoung-Tak Zhang,et al.  Online Incremental Structure Learning of Sum-Product Networks , 2013, ICONIP.

[41]  Carl E. Rasmussen,et al.  Learning Deep Mixtures of Gaussian Process Experts Using Sum-Product Networks , 2018, ArXiv.

[42]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .