How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function - Part II: the Multi-D Case of Two Layers with Random First Layer

Randomized neural networks (randomized NNs), where only the terminal layer's weights are optimized constitute a powerful model class to reduce computational time in training the neural network model. At the same time, these models generalize surprisingly well in various regression and classification tasks. In this paper, we give an exact macroscopic characterization (i.e., a characterization in function space) of the generalization behavior of randomized, shallow NNs with ReLU activation (RSNs). We show that RSNs correspond to a generalized additive model (GAM)-typed regression in which infinitely many directions are considered: the infinite generalized additive model (IGAM). The IGAM is formalized as solution to an optimization problem in function space for a specific regularization functional and a fairly general loss. This work is an extension to multivariate NNs of prior work, where we showed how wide RSNs with ReLU activation behave like spline regression under certain conditions and if the input is one-dimensional.

[1]  Julien N. Siems,et al.  Bayesian Optimization-based Combinatorial Assignment , 2022, AAAI.

[2]  J. Teichmann,et al.  Optimal Stopping via Randomized Neural Networks , 2021, Frontiers of Mathematical Finance.

[3]  Sven Seuken,et al.  NOMU: Neural Optimization-based Model Uncertainty , 2021, ICML.

[4]  Guido Montúfar,et al.  Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks , 2020, 2006.07356.

[5]  Christa Cuchiero,et al.  A Generative Adversarial Network Approach to Calibration of Local Stochastic Volatility Models , 2020, Risks.

[6]  Arthur Jacot,et al.  Implicit Regularization of Random Feature Models , 2020, ICML.

[7]  Josef Teichmann,et al.  How implicit regularization of Neural Networks affects the learned function - Part I , 2019, ArXiv.

[8]  Marcel Kreuter Vector-valued elliptic boundary value problems on rough domains , 2019 .

[9]  Xizhao Wang,et al.  A review on neural networks with random weights , 2018, Neurocomputing.

[10]  W. Arendt,et al.  Mapping theorems for Sobolev spaces of vector-valued functions , 2016, 1611.06161.

[11]  김용수,et al.  Extreme Learning Machine 기반 퍼지 패턴 분류기 설계 , 2015 .

[12]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[13]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[14]  J. Grego,et al.  Fast stable direct fitting and smoothness selection for generalized additive models , 2006, 0709.3906.

[15]  Cathy Lundmark,et al.  A Hitchhiker's Guide to… , 2005 .

[16]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[17]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[18]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[19]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[20]  J. Teichmann,et al.  Infinite wide (finite depth) Neural Networks benefit from multi-task learning unlike shallow Gaussian Processes - an exact quantitative macroscopic characterization , 2021, ArXiv.