On the Approximation Lower Bound for Neural Nets with Random Weights

A random net is a shallow neural network where the hidden layer is frozen with random assignment and the output layer is trained by convex optimization. Using random weights for a hidden layer is an effective method to avoid the inevitable non-convexity in standard gradient descent learning. It has recently been adopted in the study of deep learning theory. Here, we investigate the expressive power of random nets. We show that, despite the well-known fact that a shallow neural network is a universal approximator, a random net cannot achieve zero approximation error even for smooth functions. In particular, we prove that for a class of smooth functions, if the proposal distribution is compactly supported, then a lower bound is positive. Based on the ridgelet analysis and harmonic analysis for neural networks, the proof uses the Plancherel theorem and an estimate for the truncated tail of the parameter distribution. We corroborate our theoretical results with various simulation studies, and generally two main take-home messages are offered: (i) Not any distribution for selecting random weights is feasible to build a universal approximator; (ii) A suitable assignment of random weights exists but to some degree is associated with the complexity of the target function.

[1]  Nathan Srebro,et al.  How do infinite width bounded norm networks look in function space? , 2019, COLT.

[2]  Jeffrey Pennington,et al.  Nonlinear random matrix theory for deep learning , 2019, NIPS.

[3]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[4]  A. Rahimi,et al.  Uniform approximation of functions with random bases , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[5]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[6]  Zhenyu Liao,et al.  A Random Matrix Approach to Neural Networks , 2017, ArXiv.

[7]  Dianhui Wang,et al.  Stochastic Configuration Networks: Fundamentals and Algorithms , 2017, IEEE Transactions on Cybernetics.

[8]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[9]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[10]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[11]  Guillermo Sapiro,et al.  Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? , 2015, IEEE Transactions on Signal Processing.

[12]  Noboru Murata,et al.  An Integral Representation of Functions Using Three-layered Networks and Their Approximation Bounds , 1996, Neural Networks.

[13]  David L. Donoho Emerging applications of geometric multiscale analysis , 2002 .

[14]  Yoh-Han Pao,et al.  Stochastic choice of basis functions in adaptive function approximation and the functional-link net , 1995, IEEE Trans. Neural Networks.

[15]  Francis Bach,et al.  On Lazy Training in Differentiable Programming , 2018, NeurIPS.

[16]  Andrew R. Barron,et al.  Approximation by Combinations of ReLU and Squared ReLU Ridge Functions With $\ell^1$ and $\ell^0$ Controls , 2016, IEEE Transactions on Information Theory.

[17]  Robert D. Nowak,et al.  Neural Networks, Ridge Splines, and TV Regularization in the Radon Domain , 2020, ArXiv.

[18]  Nathan Srebro,et al.  A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case , 2019, ICLR.

[19]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[20]  Benjamin Recht,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[21]  Tengyu Ma,et al.  On the Ability of Neural Nets to Express Distributions , 2017, COLT.

[22]  Dianhui Wang,et al.  Randomness in neural networks: an overview , 2017, WIREs Data Mining Knowl. Discov..

[23]  Jean-Luc Starck,et al.  Sparse Image and Signal Processing: The Ridgelet and Curvelet Transforms , 2010 .

[24]  Gilad Yehudai,et al.  Proving the Lottery Ticket Hypothesis: Pruning is All You Need , 2020, ICML.

[25]  Dejan J. Sobajic,et al.  Learning and generalization characteristics of the random vector Functional-link net , 1994, Neurocomputing.

[26]  Yoshifusa Ito,et al.  Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory , 1991, Neural Networks.

[27]  Matus Telgarsky,et al.  Neural tangent kernels, transportation mappings, and universal approximation , 2020, ICLR.

[28]  S. M. Carroll,et al.  Construction of neural nets using the radon transform , 1989, International 1989 Joint Conference on Neural Networks.

[29]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[30]  Ming Li,et al.  Insights into randomized algorithms for neural networks: Practical issues and common pitfalls , 2017, Inf. Sci..

[31]  Xiaolin Huang,et al.  Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Robert P. W. Duin,et al.  Feedforward neural networks with random weights , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[33]  Andrea Montanari,et al.  Limitations of Lazy Training of Two-layers Neural Networks , 2019, NeurIPS.

[34]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[35]  Amit Daniely,et al.  SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.

[36]  Lei Wu,et al.  A Priori Estimates of the Generalization Error for Two-layer Neural Networks , 2018, Communications in Mathematical Sciences.

[37]  Stevan Pilipovic,et al.  The Ridgelet transform of distributions , 2013, 1306.2024.

[38]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[39]  Ivan Tyukin,et al.  Approximation with random bases: Pro et Contra , 2015, Inf. Sci..

[40]  Herbert Jaeger,et al.  Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.

[41]  Taiji Suzuki,et al.  Fast generalization error bound of deep learning from a kernel perspective , 2018, AISTATS.

[42]  Wu Lei A PRIORI ESTIMATES OF THE POPULATION RISK FOR TWO-LAYER NEURAL NETWORKS , 2020 .

[43]  E. Candès Harmonic Analysis of Neural Networks , 1999 .

[44]  Herbert Jaeger,et al.  Reservoir computing approaches to recurrent neural network training , 2009, Comput. Sci. Rev..

[45]  Xizhao Wang,et al.  A review on neural networks with random weights , 2018, Neurocomputing.

[46]  Gilad Yehudai,et al.  On the Power and Limitations of Random Features for Understanding Neural Networks , 2019, NeurIPS.

[47]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[48]  Yoshua Bengio,et al.  On Random Weights for Texture Generation in One Layer Neural Networks , 2016, ArXiv.

[49]  Matus Telgarsky,et al.  Approximation power of random neural networks , 2019, ArXiv.

[50]  B. Irie,et al.  Capabilities of three-layered perceptrons , 1988, IEEE 1988 International Conference on Neural Networks.

[51]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[52]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[53]  Noboru Murata,et al.  Sampling Hidden Parameters from Oracle Distribution , 2014, ICANN.

[54]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[55]  C. L. Philip Chen,et al.  Broad Learning System: An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[56]  Marcello Sanguineti,et al.  Approximating Multivariable Functions by Feedforward Neural Nets , 2013, Handbook on Neural Information Processing.

[57]  Sho Sonoda Fast Approximation and Estimation Bounds of Kernel Quadrature for Infinitely Wide Models , 2019 .

[58]  Boris Rubin,et al.  The Calderón reproducing formula, windowed X-ray transforms, and radon transforms in LP-spaces , 1998 .

[59]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.