Depth Separation for Neural Networks

Let $f:\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}\to\mathbb{S}$ be a function of the form $f(\mathbf{x},\mathbf{x}') = g(\langle\mathbf{x},\mathbf{x}'\rangle)$ for $g:[-1,1]\to \mathbb{R}$. We give a simple proof that shows that poly-size depth two neural networks with (exponentially) bounded weights cannot approximate $f$ whenever $g$ cannot be approximated by a low degree polynomial. Moreover, for many $g$'s, such as $g(x)=\sin(\pi d^3x)$, the number of neurons must be $2^{\Omega\left(d\log(d)\right)}$. Furthermore, the result holds w.r.t.\ the uniform distribution on $\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}$. As many functions of the above form can be well approximated by poly-size depth three networks with poly-bounded weights, this establishes a separation between depth two and depth three networks w.r.t.\ the uniform distribution on $\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}$.

[1]  Yoshua Bengio,et al.  Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[2]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[3]  Ohad Shamir,et al.  Depth Separation in ReLU Networks for Approximating Smooth Non-Linear Functions , 2016, ArXiv.

[4]  Andrew R. Barron,et al.  Approximation and estimation bounds for artificial neural networks , 2004, Machine Learning.

[5]  R. Zemel,et al.  On the Representational Efficiency of Restricted Boltzmann Machines , 2013, NIPS 2013.

[6]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[7]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[8]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[9]  Ohad Shamir,et al.  Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks , 2016, ICML.

[10]  K. Atkinson,et al.  Spherical Harmonics and Approximations on the Unit Sphere: An Introduction , 2012 .

[11]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[12]  Matus Telgarsky,et al.  Representation Benefits of Deep Feedforward Networks , 2015, ArXiv.

[13]  M. Irani Vision Day Schedule Time Speaker and Collaborators Affiliation Title a General Preprocessing Method for Improved Performance of Epipolar Geometry Estimation Algorithms on the Expressive Power of Deep Learning: a Tensor Analysis , 2016 .