The Role of Depth, Width, and Activation Complexity in the Number of Linear Regions of Neural Networks

Many feedforward neural networks generate continuous and piecewise-linear (CPWL) mappings. Specifically, they partition the input domain into regions on which the mapping is an affine function. The number of these so-called linear regions offers a natural metric to characterize the expressiveness of CPWL mappings. Although the precise determination of this quantity is often out of reach, bounds have been proposed for specific architectures, including the well-known ReLU and Maxout networks. In this work, we propose a more general perspective and provide precise bounds on the maximal number of linear regions of CPWL networks based on three sources of expressiveness: depth, width, and activation complexity. Our estimates rely on the combinatorial structure of convex partitions and highlight the distinctive role of depth which, on its own, is able to exponentially increase the number of regions. We then introduce a complementary stochastic framework to estimate the average number of linear regions produced by a CPWL network architecture. Under reasonable assumptions, the expected density of linear regions along any 1D path is bounded by the product of depth, width, and a measure of activation complexity (up to a scaling factor). This yields an identical role to the three sources of expressiveness: no exponential growth with depth is observed anymore.

[1]  M. Unser,et al.  Approximation of Lipschitz Functions using Deep Spline Neural Networks , 2022, SIAM J. Math. Data Sci..

[2]  M. Unser,et al.  Stable Parametrization of Continuous and Piecewise-Linear Functions , 2022, ArXiv.

[3]  S. Feizi,et al.  Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100 , 2021, ICLR.

[4]  D. Rolnick,et al.  Deep ReLU Networks Preserve Expected Length , 2021, ICLR.

[5]  Frank Allgöwer,et al.  Training Robust Neural Networks Using Lipschitz Bounds , 2020, IEEE Control Systems Letters.

[6]  Afshin Goodarzi,et al.  Optimal Bounds for the Colorful Fractional Helly Theorem , 2020, SoCG.

[7]  Maxime Sangnier,et al.  Approximating Lipschitz continuous functions with GroupSort neural networks , 2020, AISTATS.

[8]  Richard Baraniuk,et al.  Mad Max: Affine Spline Insights Into Deep Learning , 2018, Proceedings of the IEEE.

[9]  M. Unser,et al.  Learning Lipschitz-Controlled Activation Functions in Neural Networks for Plug-and-Play Image Reconstruction Methods , 2021 .

[10]  Michael Unser,et al.  Learning Activation Functions in Deep (Spline) Neural Networks , 2020, IEEE Open Journal of Signal Processing.

[11]  David Rolnick,et al.  Deep ReLU Networks Have Surprisingly Few Activation Patterns , 2019, NeurIPS.

[12]  Behnaam Aazhang,et al.  The Geometry of Deep Networks: Power Diagram Subdivision , 2019, NeurIPS.

[13]  David Rolnick,et al.  Complexity of Linear Regions in Deep Networks , 2019, ICML.

[14]  Cem Anil,et al.  Sorting out Lipschitz function approximation , 2018, ICML.

[15]  Peter Hinz,et al.  A Framework for the Construction of Upper Bounds on the Number of Affine Linear Regions of ReLU Feed-Forward Neural Networks , 2019, IEEE Transactions on Information Theory.

[16]  Michael Unser,et al.  A representer theorem for deep neural networks , 2018, J. Mach. Learn. Res..

[17]  Richard G. Baraniuk,et al.  A Spline Theory of Deep Learning , 2018, ICML 2018.

[18]  Christian Tjandraatmadja,et al.  Bounding and Counting Linear Regions of Deep Neural Networks , 2017, ICML.

[19]  Raman Arora,et al.  Understanding Deep Neural Networks with Rectified Linear Units , 2016, Electron. Colloquium Comput. Complex..

[20]  Lorenzo Rosasco,et al.  Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.

[21]  T. Poggio,et al.  Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.

[22]  Honglak Lee,et al.  Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units , 2016, ICML.

[23]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  G. Ziegler,et al.  Spaces of convex n-partitions , 2015, 1511.02904.

[26]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[27]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[28]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Pierre Baldi,et al.  Learning Activation Functions to Improve Deep Neural Networks , 2014, ICLR.

[30]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[31]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[32]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[35]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[36]  Shuning Wang,et al.  Generalization of hinging hyperplanes , 2005, IEEE Transactions on Information Theory.

[37]  J. M. Tarela,et al.  Region configurations for realizability of lattice Piecewise-Linear models , 1999 .

[38]  J. M. Tarela,et al.  A representation method for PWL functions oriented to parallel processing , 1990 .

[39]  T. Zaslavsky Facing Up to Arrangements: Face-Count Formulas for Partitions of Space by Hyperplanes , 1975 .