Random ReLU Features: Universality, Approximation, and Composition

We propose random ReLU features models in this work. Its motivation is rooted in both kernel methods and neural networks. We prove the universality and generalization performance of random ReLU features. Parallel to Barron's theorem, we consider the ReLU feature class, extended from the reproducing kernel Hilbert space of random ReLU features, and prove a strong quantitative approximation theorem, where both inner weights and outer weights of the the neural network with ReLU nodes as an approximator are bounded by constants. We also prove a similar approximation theorem for composition of functions in ReLU feature class by multi-layer ReLU networks. Separation theorem between ReLU feature class and their composition is proved as a consequence of separation between shallow and deep networks. These results reveal nice properties of ReLU nodes from the view of approximation theory, providing support for regularization on weights of ReLU networks and for the use of random ReLU features in practice. Our experiments confirm that the performance of random ReLU features is comparable with random Fourier features.

[1]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[2]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[3]  Yoh-Han Pao,et al.  Stochastic choice of basis functions in adaptive function approximation and the functional-link net , 1995, IEEE Trans. Neural Networks.

[4]  F. Girosi Approximation Error Bounds That Use Vc-bounds 1 , 1995 .

[5]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[6]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[7]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[8]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[9]  A. Rahimi,et al.  Uniform approximation of functions with random bases , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[10]  G. Gnecco,et al.  Approximation Error Bounds via Rademacher's Complexity , 2008 .

[11]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[12]  Tara N. Sainath,et al.  Kernel methods match Deep Neural Networks on TIMIT , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[14]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[15]  Tengyu Ma,et al.  On the Ability of Neural Nets to Express Distributions , 2017, COLT.

[16]  Hyunjoong Kim,et al.  Functional Analysis I , 2017 .

[17]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[18]  Lorenzo Rosasco,et al.  Generalization Properties of Learning with Random Features , 2016, NIPS.

[19]  Ambuj Tewari,et al.  But How Does It Work in Theory? Linear SVM with Random Features , 2018, NeurIPS.