论文信息 - Breaking the Curse of Dimensionality with Convex Neural Networks

Breaking the Curse of Dimensionality with Convex Neural Networks

We consider neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units. By letting the number of hidden units grow unbounded and using classical non-Euclidean regularization tools on the output weights, we provide a detailed theoretical analysis of their generalization performance, with a study of both the approximation and the estimation errors. We show in particular that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace. Moreover, when using sparsity-inducing norms on the input weights, we show that high-dimensional non-linear variable selection may be achieved, without any strong assumption regarding the data and with a total number of variables potentially exponential in the number of ob-servations. In addition, we provide a simple geometric interpretation to the non-convex problem of addition of a new unit, which is the core potentially hard computational element in the framework of learning from continuously many basis functions. We provide simple conditions for convex relaxations to achieve the same generalization error bounds, even when constant-factor approxi-mations cannot be found (e.g., because it is NP-hard such as for the zero-homogeneous activation function). We were not able to find strong enough convex relaxations and leave open the existence or non-existence of polynomial-time algorithms.

Francis R. Bach | F. Bach

[1] H. Whitney. Analytic Extensions of Differentiable Functions Defined in Closed Sets , 1934 .

[2] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .

[3] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[4] G. Forsythe,et al. On the Stationary Values of a Second-Degree Polynomial on the Unit Sphere , 1965 .

[5] R. Schneider. Zu einem Problem von Shephard über die Projektionen konvexer Körper , 1967 .

[6] V. F. Dem'yanov,et al. The Minimization of a Smooth Convex Functional on a Convex Set , 1967 .

[7] E. Bolker. A class of convex bodies , 1969 .

[8] 丸山徹. Convex Analysisの二,三の進展について , 1977 .

[9] J. Dunn,et al. Conditional gradient algorithms with open loop step size rules , 1978 .

[10] J. Friedman,et al. Projection Pursuit Regression , 1981 .

[11] M. Savard. Bach , 1985 .

[12] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[13] Geoffrey E. Hinton,et al. Learning representations by back-propagation errors, nature , 1986 .

[14] W. Rudin. Real and complex analysis, 3rd ed. , 1987 .

[15] Herbert Edelsbrunner,et al. Algorithms in Combinatorial Geometry , 1987, EATCS Monographs in Theoretical Computer Science.

[16] J. Lindenstrauss,et al. Approximation of zonoids by zonotopes , 1989 .

[17] R. DeVore,et al. Optimal nonlinear approximation , 1989 .

[18] Ker-Chau Li,et al. Sliced Inverse Regression for Dimension Reduction , 1991 .

[19] L. Evans. Measure theory and fine properties of functions , 1992 .

[20] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[21] Leo Breiman,et al. Hinging hyperplanes for regression, classification, and function approximation , 1993, IEEE Trans. Inf. Theory.

[22] Allan Pinkus,et al. Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[23] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .

[24] Peter L. Bartlett,et al. Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[25] J. Matousek,et al. Improved upper bounds for approximation by zonotopes , 1996 .

[26] Geoffrey E. Hinton,et al. Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[27] Y. Nesterov. Semidefinite relaxation and nonconvex quadratic optimization , 1998 .

[28] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation , 1998 .

[29] Y. Makovoz. Uniform Approximation by Neural Networks , 1998 .

[30] P. Petrushev. Approximation by ridge functions and neural networks , 1999 .

[31] Allan Pinkus,et al. Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[32] Alexander J. Smola,et al. Regularization with Dot-Product Kernels , 2000, NIPS.

[33] Ron Meir,et al. On the near optimality of the stochastic approximation of smooth functions by neural networks , 2000, Adv. Comput. Math..

[34] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[35] Marcello Sanguineti,et al. Bounds on rates of variable-basis and neural-network approximation , 2001, IEEE Trans. Inf. Theory.

[36] Vladimir Koltchinskii,et al. Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[37] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[38] Martin Burger,et al. Error Bounds for Approximation with Neural Networks , 2001, J. Approx. Theory.

[39] Alexander Barvinok,et al. A course in convexity , 2002, Graduate studies in mathematics.

[40] Chong Gu. Smoothing Spline Anova Models , 2002 .

[41] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[42] Leonidas J. Guibas,et al. Zonotopes as bounding volumes , 2003, SODA '03.

[43] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[44] Ulrike von Luxburg,et al. Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[45] Hrushikesh Narhar Mhaskar,et al. On the tractability of multivariate integration and approximation by neural networks , 2004, J. Complex..

[46] A. Berlinet,et al. Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[47] Michael I. Jordan,et al. Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004 .

[48] Ronald,et al. Learning representations by backpropagating errors , 2004 .

[49] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[50] Ji Zhu,et al. Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[51] Baver Okutmustur. Reproducing kernel Hilbert spaces , 2005 .

[52] Nicolas Le Roux,et al. Convex Neural Networks , 2005, NIPS.

[53] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[54] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .

[55] Prasad Raghavendra,et al. Hardness of Learning Halfspaces with Noise , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[56] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[57] Hrushikesh Narhar Mhaskar,et al. Weighted quadrature formulas and approximation by zonal function networks on the sphere , 2006, J. Complex..

[58] Vitaly Maiorov,et al. Approximation by neural networks and learning theory , 2006, J. Complex..

[59] Alexander A. Sherstov,et al. Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, FOCS.

[60] M. Yuan,et al. Model selection and estimation in regression with grouped variables , 2006 .

[61] Hao Helen Zhang,et al. Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[62] Ji Zhu,et al. l1 Regularization in Infinite Dimensional Feature Spaces , 2007, COLT.

[63] Robert E. Mahony,et al. Optimization Algorithms on Matrix Manifolds , 2007 .

[64] Nathan Srebro,et al. ` 1 Regularization in Infinite Dimensional Feature Spaces , 2007 .

[65] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.

[66] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[67] Nicolas Le Roux,et al. Continuous Neural Networks , 2007, AISTATS.