Interpreting Extreme Learning Machine as an Approximation to an Infinite Neural Network

Extreme Learning Machine (ELM) is a neural network architecture in which hidden layer weights are randomly chosen and output layer weights determined analytically. We interpret ELM as an approximation to a network with infinite number of hidden units. The operation of the infinite network is captured by neural network kernel (NNK). We compare ELM and NNK both as part of a kernel method and in neural network context. Insights gained from this analysis lead us to strongly recommend model selection also on the variance of ELM hidden layer weights, and not only on the number of hidden units, as is usually done with ELM. We also discuss some properties of ELM, which may have been too strongly interpreted in previous works.

[1]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[2]  Amaury Lendasse,et al.  OP-ELM: Optimally Pruned Extreme Learning Machine , 2010, IEEE Transactions on Neural Networks.

[3]  A. Householder,et al.  Discussion of a set of points in terms of their mutual distances , 1938 .

[4]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[5]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[7]  John Jones,et al.  Estimation of variance and covariance components in linear models containing multiparameter matrices , 1988 .

[8]  Christopher K. I. Williams Computation with Infinite Neural Networks , 1998, Neural Computation.

[9]  Alan J. Mayne,et al.  Generalized Inverse of Matrices and its Applications , 1972 .

[10]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[11]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[12]  Benoît Frénay,et al.  Using SVMs with randomised feature spaces: an extreme learning approach , 2010, ESANN.

[13]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[14]  K. S. Banerjee Generalized Inverse of Matrices and Its Applications , 1973 .

[15]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.