Universal Approximation with Deep Narrow Networks

The classical Universal Approximation Theorem holds for neural networks of arbitrary width and bounded depth. Here we consider the natural `dual' scenario for networks of bounded width and arbitrary depth. Precisely, let $n$ be the number of inputs neurons, $m$ be the number of output neurons, and let $\rho$ be any nonaffine continuous function, with a continuous nonzero derivative at some point. Then we show that the class of neural networks of arbitrary depth, width $n + m + 2$, and activation function $\rho$, is dense in $C(K; \mathbb{R}^m)$ for $K \subseteq \mathbb{R}^n$ with $K$ compact. This covers every activation function possible to use in practice, and also includes polynomial activation functions, which is unlike the classical version of the theorem, and provides a qualitative difference between deep narrow networks and shallow wide networks. We then consider several extensions of this result. In particular we consider nowhere differentiable activation functions, density in noncompact domains with respect to the $L^p$-norm, and how the width may be reduced to just $n + m + 1$ for `most' activation functions.

[1]  Guido Montúfar,et al.  Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units , 2013, Neural Computation.

[2]  Jesse Johnson,et al.  Deep, Skinny Neural Networks are not Universal Approximators , 2018, ICLR.

[3]  Kristian Kersting,et al.  Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks , 2019, ICLR.

[4]  Brendan McCane,et al.  Deep, super-narrow neural network is a universal classifier , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[5]  Tuo Zhao,et al.  Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds , 2019, NeurIPS.

[6]  Joan Bruna,et al.  Gradient Dynamics of Shallow Univariate ReLU Networks , 2019, NeurIPS.

[7]  David Duvenaud,et al.  Residual Flows for Invertible Generative Modeling , 2019, NeurIPS.

[8]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Suvrit Sra,et al.  Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity , 2018, NeurIPS.

[10]  Raman Arora,et al.  Understanding Deep Neural Networks with Rectified Linear Units , 2016, Electron. Colloquium Comput. Complex..

[11]  Stefanie Jegelka,et al.  ResNet with one-neuron hidden layers is a Universal Approximator , 2018, NeurIPS.

[12]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[13]  Philipp Petersen,et al.  Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.

[14]  Matthias Hein,et al.  Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions , 2018, ICML.

[15]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[16]  Mark Sellke,et al.  Approximating Continuous Functions by ReLU Nets of Minimal Width , 2017, ArXiv.

[17]  Alex Townsend,et al.  Rational neural networks , 2020, NeurIPS.

[18]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[19]  Liwei Wang,et al.  The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[20]  Quoc V. Le,et al.  Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[21]  Nicolas Le Roux,et al.  Deep Belief Networks Are Compact Universal Approximators , 2010, Neural Computation.

[22]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[23]  Udo Schröder,et al.  On decision regions of narrow deep neural networks , 2018, ArXiv.

[24]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[25]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[26]  G. Petrova,et al.  Nonlinear Approximation and (Deep) ReLU\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {ReLU}$$\end{document} , 2019, Constructive Approximation.

[27]  Gitta Kutyniok,et al.  Error bounds for approximations with deep ReLU neural networks in $W^{s, p}$ norms , 2019, Analysis and Applications.

[28]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[29]  Geoffrey E. Hinton,et al.  Deep, Narrow Sigmoid Belief Networks Are Universal Approximators , 2008, Neural Computation.

[30]  Philipp Grohs,et al.  How degenerate is the parametrization of neural networks with the ReLU activation function? , 2019, NeurIPS.

[31]  A. Krizhevsky Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[32]  Raúl Rojas Networks of width one are universal classifiers , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[33]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.