论文信息 - On the number of response regions of deep feed forward networks with piece-wise linear activations

On the number of response regions of deep feed forward networks with piece-wise linear activations

This paper explores the complexity of deep feedforward networks with linear pre-synaptic couplings and rectified linear activations. This is a contribution to the growing body of work contrasting the representational power of deep and shallow network architectures. In particular, we offer a framework for comparing deep and shallow models that belong to the family of piecewise linear functions based on computational geometry. We look at a deep rectifier multi-layer perceptron (MLP) with linear outputs units and compare it with a single layer version of the model. In the asymptotic regime, when the number of inputs stays constant, if the shallow model has $kn$ hidden units and $n_0$ inputs, then the number of linear regions is $O(k^{n_0}n^{n_0})$. For a $k$ layer model with $n$ hidden units on each layer it is $\Omega(\left\lfloor {n}/{n_0}\right\rfloor^{k-1}n^{n_0})$. The number $\left\lfloor{n}/{n_0}\right\rfloor^{k-1}$ grows faster than $k^{n_0}$ when $n$ tends to infinity or when $k$ tends to infinity and $n \geq 2n_0$. Additionally, even when $k$ is small, if we restrict $n$ to be $2n_0$, we can show that a deep model has considerably more linear regions that a shallow one. We consider this as a first step towards understanding the complexity of these models and specifically towards providing suitable mathematical tools for future analysis.

[1] T. Zaslavsky. Facing Up to Arrangements: Face-Count Formulas for Partitions of Space by Hyperplanes , 1975 .

[2] Johan Håstad,et al. Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[3] Pavel Pudlák,et al. Threshold circuits of bounded depth , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[4] Johan Håstad,et al. On the power of small-depth threshold circuits , 1991, computational complexity.

[5] R. Stanley. An Introduction to Hyperplane Arrangements , 2007 .

[6] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[7] Geoffrey E. Hinton,et al. Deep, Narrow Sigmoid Belief Networks Are Universal Approximators , 2008, Neural Computation.

[8] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[9] Nicolas Le Roux,et al. Deep Belief Networks Are Compact Universal Approximators , 2010, Neural Computation.

[10] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[11] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[12] Yoshua Bengio,et al. On the Expressive Power of Deep Architectures , 2011, ALT.

[13] Nihat Ay,et al. Expressive Power and Approximation Errors of Restricted Boltzmann Machines , 2011, NIPS.

[14] Nihat Ay,et al. Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines , 2010, Neural Computation.

[15] Pedro M. Domingos,et al. Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[16] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[17] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[18] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.

[19] Toniann Pitassi,et al. On the Expressive Power of Restricted Boltzmann Machines , 2013, NIPS.

[20] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[21] Jason Morton,et al. When Does a Mixture of Products Contain a Product of Mixtures? , 2012, SIAM J. Discret. Math..