Uniform approximation of functions with random bases

Random networks of nonlinear functions have a long history of empirical success in function fitting but few theoretical guarantees. In this paper, using techniques from probability on Banach Spaces, we analyze a specific architecture of random nonlinearities, provide Linfin and L2 error bounds for approximating functions in Reproducing Kernel Hilbert Spaces, and discuss scenarios when these expansions are dense in the continuous functions. We discuss connections between these random nonlinear networks and popular machine learning algorithms and show experimentally that these networks provide competitive performance at far lower computational cost on large-scale pattern recognition tasks.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  W. Rudin,et al.  Fourier Analysis on Groups. , 1965 .

[3]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[4]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[5]  F. Girosi,et al.  Convergence Rates of Approximation by Translates , 1992 .

[6]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[7]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[8]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[9]  F. Girosi Approximation Error Bounds That Use Vc-bounds 1 , 1995 .

[10]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[11]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[12]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[15]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[16]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[17]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[18]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[19]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[20]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[21]  Henry Markram,et al.  On the computational power of circuits of spiking neurons , 2004, J. Comput. Syst. Sci..

[22]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[23]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[24]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[25]  G. Gnecco,et al.  Approximation Error Bounds via Rademacher's Complexity , 2008 .