论文信息 - Approximation by Combinations of ReLU and Squared ReLU Ridge Functions With $\ell^1$ and $\ell^0$ Controls

Approximation by Combinations of ReLU and Squared ReLU Ridge Functions With $\ell^1$ and $\ell^0$ Controls

We establish <inline-formula> <tex-math notation="LaTeX">$L^{\infty } $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$L^{2} $ </tex-math></inline-formula> error bounds for functions of many variables that are approximated by linear combinations of rectified linear unit (ReLU) and squared ReLU ridge functions with <inline-formula> <tex-math notation="LaTeX">$\ell ^{1} $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$\ell ^{0} $ </tex-math></inline-formula> controls on their inner and outer parameters. With the squared ReLU ridge function, we show that the <inline-formula> <tex-math notation="LaTeX">$L^{2} $ </tex-math></inline-formula> approximation error is inversely proportional to the inner layer <inline-formula> <tex-math notation="LaTeX">$\ell ^{0} $ </tex-math></inline-formula> sparsity and it need only be sublinear in the outer layer <inline-formula> <tex-math notation="LaTeX">$\ell ^{0} $ </tex-math></inline-formula> sparsity. Our constructions are obtained using a variant of the Maurey–Jones–Barron probabilistic method, which can be interpreted as either stratified sampling with proportionate allocation or two-stage cluster sampling. We also provide companion error lower bounds that reveal near optimality of our constructions. Despite the sparsity assumptions, we showcase the richness and flexibility of these ridge combinations by defining a large family of functions, in terms of certain spectral conditions, that are particularly well approximated by them.

Andrew R. Barron | Jason M. Klusowski | A. Barron

[1] Jason M. Klusowski,et al. Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks , 2016, 1607.01434.

[2] Martin J. Wainwright,et al. Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[3] Y. Makovoz. Uniform Approximation by Neural Networks , 1998 .

[4] David Haussler,et al. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[5] Andrew R. Barron,et al. Approximation and estimation bounds for artificial neural networks , 2004, Machine Learning.

[6] Y. Makovoz. Random Approximants and Neural Networks , 1996 .

[7] Andrew R. Barron,et al. Minimax lower bounds for ridge combinations including neural nets , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[8] Marcello Sanguineti,et al. Estimates of covering numbers of convex sets with slowly decaying orthogonal subsets , 2007, Discret. Appl. Math..

[9] Martin J. Wainwright,et al. Learning Halfspaces and Neural Networks with Random Initialization , 2015, ArXiv.

[10] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[11] Andrew R. Barron,et al. A Better Approximation for Balls , 2000 .