Shallow vs. Deep Sum-Product Networks

We investigate the representational power of sum-product networks (computation networks analogous to neural networks, but whose individual units compute either products or weighted sums), through a theoretical analysis that compares deep (multiple hidden layers) vs. shallow (one hidden layer) architectures. We prove there exist families of functions that can be represented much more efficiently with a deep network than with a shallow one, i.e. with substantially fewer hidden units. Such results were not available until now, and contribute to motivate recent research involving learning of deep sum-product networks, and more generally motivate research in Deep Learning.

[1]  Miklós Ajtai,et al.  ∑11-Formulae on finite structures , 1983, Ann. Pure Appl. Log..

[2]  Andrew Chi-Chih Yao,et al.  Separating the Polynomial-Time Hierarchy by Oracles (Preliminary Version) , 1985, FOCS.

[3]  A. Yao Separating the polynomial-time hierarchy by oracles , 1985 .

[4]  Johan Håstad,et al.  Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[5]  Pekka Orponen,et al.  Computational complexity of neural networks: a survey , 1994 .

[6]  P. Orponen COMPUTATIONAL COMPLEXITY OF NEURAL NETWORKS : A SURVEYPEKKA ORPONEN , 1994 .

[7]  Eric Allender,et al.  Circuit Complexity before the Dawn of the New Millennium , 1996, FSTTCS.

[8]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[9]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[10]  Paul E. Utgoff,et al.  Many-Layered Learning , 2002, Neural Computation.

[11]  Paul E. Utgoff,et al.  Many-Layered Learning , 2002, Neural Computation.

[12]  Johan Håstad,et al.  On the power of small-depth threshold circuits , 1991, computational complexity.

[13]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[14]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[15]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[16]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[17]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[18]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[19]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[20]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[21]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[22]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[23]  Geoffrey E. Hinton,et al.  Modeling image patches with a directed hierarchy of Markov random fields , 2007, NIPS.

[24]  Thomas Serre,et al.  A quantitative theory of immediate visual recognition. , 2007, Progress in brain research.

[25]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[26]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[27]  Marc'Aurelio Ranzato,et al.  Semi-supervised learning of compact document representations with deep networks , 2008, ICML '08.

[28]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[29]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[30]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[31]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[32]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[33]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[34]  Hossein Mobahi,et al.  Deep learning from temporal coherence in video , 2009, ICML '09.

[35]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[36]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[37]  Geoffrey E. Hinton,et al.  Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[38]  Yoshua Bengio,et al.  DECISION TREES DO NOT GENERALIZE TO NEW VARIATIONS , 2010, Comput. Intell..

[39]  Mark Braverman,et al.  Poly-logarithmic independence fools bounded-depth boolean circuits , 2011, Commun. ACM.

[40]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[41]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[42]  Hossein Mobahi,et al.  Deep Learning via Semi-supervised Embedding , 2012, Neural Networks: Tricks of the Trade.