Integral representations of shallow neural network with Rectified Power Unit activation function

In this effort, we derive a formula for the integral representation of a shallow neural network with the Rectified Power Unit activation function. Mainly, our first result deals with the univariate case of representation capability of RePU shallow networks. The multidimensional result in this paper characterizes the set of function that can be represented with bounded norm and possibly unbounded width.

[1]  Konstantin Pieper,et al.  Nonconvex penalization for sparse neural networks , 2020, ArXiv.

[2]  Vladik Kreinovich,et al.  Estimates of the Number of Hidden Units and Variation with Respect to Half-Spaces , 1997, Neural Networks.

[3]  Peter L. Bartlett,et al.  For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.

[4]  Samy Bengio,et al.  Understanding deep learning (still) requires rethinking generalization , 2021, Commun. ACM.

[5]  Nathan Srebro,et al.  How do infinite width bounded norm networks look in function space? , 2019, COLT.

[6]  Robert D. Nowak,et al.  Banach Space Representer Theorems for Neural Networks and Ridge Splines , 2020, J. Mach. Learn. Res..

[7]  Yoshifusa Ito,et al.  Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory , 1991, Neural Networks.

[8]  Mert Pilanci,et al.  Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks , 2020, ICML.

[9]  Matus Telgarsky,et al.  Benefits of Depth in Neural Networks , 2016, COLT.

[10]  Donald C. Solmon,et al.  Asymptotic formulas for the dual radon transform and applications , 1987 .

[11]  Helmut Bölcskei,et al.  Optimal Approximation with Sparsely Connected Deep Neural Networks , 2017, SIAM J. Math. Data Sci..

[12]  Ryota Tomioka,et al.  Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[13]  Philipp Grohs,et al.  Approximations with deep neural networks in Sobolev time-space , 2020, Analysis and Applications.

[14]  Paul C. Kainen,et al.  A Sobolev-type upper bound for rates of approximation by linear combinations of Heaviside plane waves , 2007, J. Approx. Theory.

[15]  C. Webster,et al.  Neural network integral representations with the ReLU activation function , 2019, MSML.

[16]  Paul C. Kainen,et al.  Integral combinations of Heavisides , 2010 .

[17]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[18]  Nathan Srebro,et al.  A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case , 2019, ICLR.

[19]  Paul C. Kainen,et al.  An integral formula for Heaviside neural networks , 2004 .

[21]  Jan Boman,et al.  Support Theorems for the Radon Transform and Cramér-Wold Theorems , 2008, 0802.4373.

[22]  C. Stolk The Radon transform , 2014 .

[23]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[24]  Jihao Long,et al.  Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay , 2021, ArXiv.

[25]  J. K. Hunter,et al.  Measure Theory , 2007 .

[26]  Colin Wei,et al.  Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.

[27]  Johannes Schmidt-Hieber,et al.  Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.

[28]  Noboru Murata,et al.  An Integral Representation of Functions Using Three-layered Networks and Their Approximation Bounds , 1996, Neural Networks.

[29]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[30]  Paul Malliavin,et al.  Integration and Probability , 1995, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[31]  Yury Korolev Two-layer neural networks with values in a Banach space , 2021, ArXiv.

[32]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[33]  Rémi Gribonval,et al.  Approximation Spaces of Deep Neural Networks , 2019, Constructive Approximation.

[34]  Nathan Srebro,et al.  Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models , 2019, ICML.

[35]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[36]  Vera Kurková,et al.  Complexity estimates based on integral transforms induced by computational units , 2012, Neural Networks.

[37]  Matus Telgarsky,et al.  Neural tangent kernels, transportation mappings, and universal approximation , 2020, ICLR.

[38]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[39]  Francis Bach,et al.  Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.

[40]  E. Candès Harmonic Analysis of Neural Networks , 1999 .