论文信息 - Shallow vs. Deep Sum-Product Networks

Shallow vs. Deep Sum-Product Networks

We investigate the representational power of sum-product networks (computation networks analogous to neural networks, but whose individual units compute either products or weighted sums), through a theoretical analysis that compares deep (multiple hidden layers) vs. shallow (one hidden layer) architectures. We prove there exist families of functions that can be represented much more efficiently with a deep network than with a shallow one, i.e. with substantially fewer hidden units. Such results were not available until now, and contribute to motivate recent research involving learning of deep sum-product networks, and more generally motivate research in Deep Learning.

Yoshua Bengio | Olivier Delalleau | Yoshua Bengio | Olivier Delalleau

[1] Miklós Ajtai,et al. ∑11-Formulae on finite structures , 1983, Ann. Pure Appl. Log..

[2] Andrew Chi-Chih Yao,et al. Separating the Polynomial-Time Hierarchy by Oracles (Preliminary Version) , 1985, FOCS.

[3] A. Yao. Separating the polynomial-time hierarchy by oracles , 1985 .

[4] Johan Håstad,et al. Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[5] Pekka Orponen,et al. Computational complexity of neural networks: a survey , 1994 .

[6] P. Orponen. COMPUTATIONAL COMPLEXITY OF NEURAL NETWORKS : A SURVEYPEKKA ORPONEN , 1994 .

[7] Eric Allender,et al. Circuit Complexity before the Dawn of the New Millennium , 1996, FSTTCS.

[8] San Cristóbal Mateo,et al. The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[9] David H. Wolpert,et al. The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[10] Paul E. Utgoff,et al. Many-Layered Learning , 2002, Neural Computation.

[11] Paul E. Utgoff,et al. Many-Layered Learning , 2002, Neural Computation.

[12] Johan Håstad,et al. On the power of small-depth threshold circuits , 1991, computational complexity.

[13] Nicolas Le Roux,et al. The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[14] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[15] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[16] Marc'Aurelio Ranzato,et al. Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[17] Geoffrey E. Hinton,et al. Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[18] Honglak Lee,et al. Sparse deep belief net model for visual area V2 , 2007, NIPS.

[19] Marc'Aurelio Ranzato,et al. Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[20] Thomas Hofmann,et al. Greedy Layer-Wise Training of Deep Networks , 2007 .

[21] B. Schölkopf,et al. Modeling Human Motion Using Binary Latent Variables , 2007 .

[22] Yoshua Bengio,et al. Scaling learning algorithms towards AI , 2007 .

[23] Geoffrey E. Hinton,et al. Modeling image patches with a directed hierarchy of Markov random fields , 2007, NIPS.

[24] Thomas Serre,et al. A quantitative theory of immediate visual recognition. , 2007, Progress in brain research.

[25] Yoshua Bengio,et al. An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[26] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[27] Marc'Aurelio Ranzato,et al. Semi-supervised learning of compact document representations with deep networks , 2008, ICML '08.

[28] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[29] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[30] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[31] Geoffrey E. Hinton,et al. Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[32] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[33] Geoffrey E. Hinton,et al. Semantic hashing , 2009, Int. J. Approx. Reason..

[34] Hossein Mobahi,et al. Deep learning from temporal coherence in video , 2009, ICML '09.

[35] Christopher D. Manning,et al. Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[36] Y-Lan Boureau,et al. Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[37] Geoffrey E. Hinton,et al. Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[38] Yoshua Bengio,et al. DECISION TREES DO NOT GENERALIZE TO NEW VARIATIONS , 2010, Comput. Intell..

[39] Mark Braverman,et al. Poly-logarithmic independence fools bounded-depth boolean circuits , 2011, Commun. ACM.

[40] Andrew Y. Ng,et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[41] Pedro M. Domingos,et al. Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[42] Hossein Mobahi,et al. Deep Learning via Semi-supervised Embedding , 2012, Neural Networks: Tricks of the Trade.