On the number of response regions of deep feed forward networks with piece-wise linear activations

This paper explores the complexity of deep feedforward networks with linear pre-synaptic couplings and rectified linear activations. This is a contribution to the growing body of work contrasting the representational power of deep and shallow network architectures. In particular, we offer a framework for comparing deep and shallow models that belong to the family of piecewise linear functions based on computational geometry. We look at a deep rectifier multi-layer perceptron (MLP) with linear outputs units and compare it with a single layer version of the model. In the asymptotic regime, when the number of inputs stays constant, if the shallow model has $kn$ hidden units and $n_0$ inputs, then the number of linear regions is $O(k^{n_0}n^{n_0})$. For a $k$ layer model with $n$ hidden units on each layer it is $\Omega(\left\lfloor {n}/{n_0}\right\rfloor^{k-1}n^{n_0})$. The number $\left\lfloor{n}/{n_0}\right\rfloor^{k-1}$ grows faster than $k^{n_0}$ when $n$ tends to infinity or when $k$ tends to infinity and $n \geq 2n_0$. Additionally, even when $k$ is small, if we restrict $n$ to be $2n_0$, we can show that a deep model has considerably more linear regions that a shallow one. We consider this as a first step towards understanding the complexity of these models and specifically towards providing suitable mathematical tools for future analysis.

[1]  T. Zaslavsky Facing Up to Arrangements: Face-Count Formulas for Partitions of Space by Hyperplanes , 1975 .

[2]  Johan Håstad,et al.  Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[3]  Pavel Pudlák,et al.  Threshold circuits of bounded depth , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[4]  Johan Håstad,et al.  On the power of small-depth threshold circuits , 1991, computational complexity.

[5]  R. Stanley An Introduction to Hyperplane Arrangements , 2007 .

[6]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[7]  Geoffrey E. Hinton,et al.  Deep, Narrow Sigmoid Belief Networks Are Universal Approximators , 2008, Neural Computation.

[8]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[9]  Nicolas Le Roux,et al.  Deep Belief Networks Are Compact Universal Approximators , 2010, Neural Computation.

[10]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[11]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[12]  Yoshua Bengio,et al.  On the Expressive Power of Deep Architectures , 2011, ALT.

[13]  Nihat Ay,et al.  Expressive Power and Approximation Errors of Restricted Boltzmann Machines , 2011, NIPS.

[14]  Nihat Ay,et al.  Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines , 2010, Neural Computation.

[15]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[16]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[17]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[18]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[19]  Toniann Pitassi,et al.  On the Expressive Power of Restricted Boltzmann Machines , 2013, NIPS.

[20]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[21]  Jason Morton,et al.  When Does a Mixture of Products Contain a Product of Mixtures? , 2012, SIAM J. Discret. Math..