On the Approximation Capabilities of ReLU Neural Networks and Random ReLU Features

Inspired by Barron's seminal work of the quantitative approximation result on neural networks with sigmoidal activation nodes, we study the approximation property of neural networks with ReLU nodes. By considering the functions expressed as the transforms of signed measures under the transformation induced by ReLU nodes, we are able to prove stronger approximation results than Barron's, in which upper bounds on both inner and outer weights are obtained for a given approximation accuracy. We also extend the approximation result to multi-layer cases and prove a depth separation result for the function class we consider. Because of the strong connection between single-hidden-layer neural networks and random features models, we further study the approximation property of random ReLU features. We provide sufficient conditions on the universality of random ReLU features and describe the random ReLU features algorithm with a provable learning rate. We also generalize our result on random ReLU features to a broader class of random features.

[1]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[2]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[3]  Valdir Antonio Menegatto,et al.  Sharp estimates for eigenvalues of integral operators generated by dot product kernels on the sphere , 2014, J. Approx. Theory.

[4]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[5]  Amit Daniely,et al.  Depth Separation for Neural Networks , 2017, COLT.

[6]  Tara N. Sainath,et al.  Kernel methods match Deep Neural Networks on TIMIT , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[8]  Tengyu Ma,et al.  On the Ability of Neural Nets to Express Distributions , 2017, COLT.

[9]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[10]  Lorenzo Rosasco,et al.  Learning with SGD and Random Features , 2018, NeurIPS.

[11]  G. Gnecco,et al.  Approximation Error Bounds via Rademacher's Complexity , 2008 .

[12]  Lorenzo Rosasco,et al.  Generalization Properties of Learning with Random Features , 2016, NIPS.

[13]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[14]  Hyunjoong Kim,et al.  Functional Analysis I , 2017 .

[15]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[16]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[17]  Guang-Bin Huang,et al.  An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels , 2014, Cognitive Computation.

[18]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[19]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[20]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[21]  G. Pisier Remarques sur un résultat non publié de B. Maurey , 1981 .

[22]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[23]  F. Girosi Approximation Error Bounds That Use Vc-bounds 1 , 1995 .

[24]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[25]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.