Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning

Randomized neural networks are immortalized in this AI Koan: In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. "What are you doing?" asked Minsky. "I am training a randomly wired neural net to play tic-tac-toe," Sussman replied. "Why is the net wired randomly?" asked Minsky. Sussman replied, "I do not want it to have any preconceptions of how to play." Minsky then shut his eyes. "Why do you close your eyes?" Sussman asked his teacher. "So that the room will be empty," replied Minsky. At that moment, Sussman was enlightened. We analyze shallow random networks with the help of concentration of measure inequalities. Specifically, we consider architectures that compute a weighted sum of their inputs after passing them through a bank of arbitrary randomized nonlinearities. We identify conditions under which these networks exhibit good classification performance, and bound their test error in terms of the size of the dataset and the number of random nonlinearities.

[1]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[2]  J. Craggs Applied Mathematical Sciences , 1973 .

[3]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[4]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[5]  F. Girosi Approximation Error Bounds That Use Vc-bounds 1 , 1995 .

[6]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[8]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[9]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[10]  Tong Zhang Approximation Bounds for Some Sparse Kernel Regression Algorithms , 2002, Neural Computation.

[11]  T. Poggio,et al.  Chapter 7 Regularized Least-Squares Classification , 2003 .

[12]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[13]  David D. Denison,et al.  Nonlinear estimation and classification , 2003 .

[14]  Henry Markram,et al.  On the computational power of circuits of spiking neurons , 2004, J. Comput. Syst. Sci..

[15]  Jing Peng,et al.  SVM vs regularized least squares classification , 2004, ICPR 2004.

[16]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[17]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[18]  T. Poggio,et al.  Regularized Least-Squares Classification 133 In practice , although , 2007 .

[19]  G. Gnecco,et al.  Approximation Error Bounds via Rademacher's Complexity , 2008 .