What Kinds of Functions do Deep Neural Networks Learn? Insights from Variational Spline Theory

We develop a variational framework to understand the properties of functions learned by fitting deep neural networks with rectified linear unit activations to data. We propose a new function space, which is reminiscent of classical bounded variation-type spaces, that captures the compositional structure associated with deep neural networks. We derive a representer theorem showing that deep ReLU networks are solutions to regularized data fitting problems over functions from this space. The function space consists of compositions of functions from the Banach spaces of second-order bounded variation in the Radon domain. These are Banach spaces with sparsity-promoting norms, giving insight into the role of sparsity in deep neural networks. The neural network solutions have skip connections and rank bounded weight matrices, providing new theoretical support for these common architectural choices. The variational problem we study can be recast as a finite-dimensional neural network training problem with regularization schemes related to the notions of weight decay and path-norm regularization. Finally, our analysis builds on techniques from variational spline theory, providing new connections between deep neural networks and splines.

[1]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[2]  M. Reed Methods of Modern Mathematical Physics. I: Functional Analysis , 1972 .

[3]  R. Nowak,et al.  The Role of Neural Network Activation Functions , 2019, IEEE Signal Processing Letters.

[4]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[5]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[6]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[7]  M. Zabarankin,et al.  Convex functional analysis , 2005 .

[8]  Michael Unser,et al.  Convex Optimization in Sums of Banach Spaces , 2021, 2104.13127.

[9]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[10]  Michael Unser,et al.  A representer theorem for deep neural networks , 2018, J. Mach. Learn. Res..

[11]  Gerald B. Folland,et al.  Real Analysis: Modern Techniques and Their Applications , 1984 .

[12]  Elias M. Stein,et al.  Fourier Analysis: An Introduction , 2003 .

[13]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[14]  Ryota Tomioka,et al.  Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[15]  Michael Unser,et al.  Deep Convolutional Neural Network for Inverse Problems in Imaging , 2016, IEEE Transactions on Image Processing.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[18]  Michael Unser,et al.  Splines Are Universal Solutions of Linear Inverse Problems with Generalized TV Regularization , 2016, SIAM Rev..

[19]  H. Weinert,et al.  Vector-valued Lg-splines I. Interpolating splines , 1979 .

[20]  G. Wahba Spline models for observational data , 1990 .

[21]  Yi Zhang,et al.  Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.

[22]  Michael Griebel,et al.  A representer theorem for deep kernel learning , 2017, J. Mach. Learn. Res..

[23]  F. John Plane Waves and Spherical Means: Applied To Partial Differential Equations , 1981 .

[24]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[25]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[26]  Nathan Srebro,et al.  How do infinite width bounded norm networks look in function space? , 2019, COLT.

[27]  Michael Unser,et al.  Deep Neural Networks With Trainable Activations and Controlled Lipschitz Constant , 2020, IEEE Transactions on Signal Processing.

[28]  Michael Unser,et al.  Learning Activation Functions in Deep (Spline) Neural Networks , 2020, IEEE Open Journal of Signal Processing.

[29]  Richard Baraniuk,et al.  Mad Max: Affine Spline Insights Into Deep Learning , 2018, Proceedings of the IEEE.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  C. D. Boor,et al.  On splines and their minimum properties , 1966 .

[32]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[33]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[34]  S. Semmes Topological Vector Spaces , 2003 .

[35]  Behnam Neyshabur,et al.  Are wider nets better given the same number of parameters? , 2021, ICLR.

[36]  Philip H. S. Torr,et al.  Stable Rank Normalization for Improved Generalization in Neural Networks and GANs , 2019, ICLR.

[37]  Robert D. Nowak,et al.  Banach Space Representer Theorems for Neural Networks and Ridge Splines , 2020, J. Mach. Learn. Res..

[38]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[39]  Ruslan Salakhutdinov,et al.  Geometry of Optimization and Implicit Regularization in Deep Learning , 2017, ArXiv.

[40]  Joseph W. Jerome,et al.  Spline solutions to L1 extremal problems in one and several variables , 1975 .

[41]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[42]  Ruslan Salakhutdinov,et al.  Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.

[43]  Michael Unser,et al.  A Unifying Representer Theorem for Inverse Problems and Machine Learning , 2019, Foundations of Computational Mathematics.

[44]  Nathan Srebro,et al.  A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case , 2019, ICLR.

[45]  Dimitris Papailiopoulos,et al.  Pufferfish: Communication-efficient Models At No Extra Cost , 2021, MLSys.

[46]  Raman Arora,et al.  Understanding Deep Neural Networks with Rectified Linear Units , 2016, Electron. Colloquium Comput. Complex..

[47]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[48]  Richard G. Baraniuk,et al.  A Spline Theory of Deep Learning , 2018, ICML 2018.

[49]  K. Bredies,et al.  Sparsity of solutions for variational inverse problems with finite-dimensional data , 2018, Calculus of Variations and Partial Differential Equations.

[50]  S. Geer,et al.  Locally adaptive regression splines , 1997 .

[51]  Hyunjoong Kim,et al.  Functional Analysis I , 2017 .

[52]  J. Urry Complexity , 2006, Interpreting Art.