On the number of inference regions of deep feed forward networks with piece-wise linear activations

Abstract: This paper explores the complexity of deep feed forward networks with linear pre-synaptic couplings and rectified linear activations. This is a contribution to the growing body of work contrasting the representational power of deep and shallow network architectures. In particular, we offer a framework for comparing deep and shallow models that belong to the family of piecewise linear functions based on computational geometry. We look at a deep rectifier multi-layer perceptron (MLP) with linear outputs units and compare it with a single layer version of the model. In the asymptotic regime, when the number of inputs stays constant, if the shallow model has $kn$ hidden units and $n_0$ inputs, then the number of linear regions is $O(k^{n_0}n^{n_0})$. For a $k$ layer model with $n$ hidden units on each layer it is $\Omega(\left( {n}/{n_0}\right)^{k-1}n^{n_0})$. $\left({n}/{n_0}\right)^{k-1}$ grows faster then $k^{n_0}$ when either $n$ goes to infinity or $k$ goes to infinity and $n > 2n_0$. We consider this as a first step towards understanding the complexity of these models and specifically towards providing suitable mathematical tools for future analysis.