Extreme Learning Machine: A Robust Modeling Technique? Yes!

In this paper is described the original (basic) Extreme Learning Machine (ELM). Properties like robustness and sensitivity to variable selection are studied. Several extensions of the original ELM are then presented and compared. Firstly, Tikhonov-Regularized Optimally-Pruned Extreme Learning Machine (TROP-ELM) is summarized as an improvement of the Optimally-Pruned Extreme Learning Machine (OP-ELM) in the form of a L2 regularization penalty applied within the OP-ELM. Secondly, a Methodology to Linearly Ensemble ELM (ELM-ELM) is presented in order to improve the performance of the original ELM. These methodologies (TROP-ELM and ELM-ELM) are tested against state of the art methods such as Support Vector Machines or Gaussian Processes and the original ELM and OP-ELM, on ten different data sets. A specific experiment to test the sensitivity of these methodologies to variable selection is also presented.

[1]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[4]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[5]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[6]  Amaury Lendasse,et al.  TROP-ELM: A double-regularized ELM using LARS and Tikhonov regularization , 2011, Neurocomputing.

[7]  Véra Kůrková,et al.  Artificial Neural Networks - ICANN 2008 , 18th International Conference, Prague, Czech Republic, September 3-6, 2008, Proceedings, Part I , 2008, ICANN.

[8]  Qinghua Zheng,et al.  Regularized Extreme Learning Machine , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[9]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[10]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[11]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[12]  Arthur E. Hoerl,et al.  Application of ridge analysis to regression problems , 1962 .

[13]  Erkki Oja,et al.  GPU-accelerated and parallelized ELM ensembles for large-scale regression , 2011, Neurocomputing.

[14]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[15]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[16]  Lei Chen,et al.  Enhanced random search based incremental extreme learning machine , 2008, Neurocomputing.

[17]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[18]  Amaury Lendasse,et al.  OP-ELM: Theory, Experiments and a Toolbox , 2008, ICANN.

[19]  K. S. Banerjee Generalized Inverse of Matrices and Its Applications , 1973 .

[20]  Yuan Lan,et al.  Random search enhancement of error minimized extreme learning machine , 2010, ESANN.

[21]  Erkki Oja,et al.  Artificial Neural Networks: Formal Models and Their Applications - ICANN 2005, 15th International Conference, Warsaw, Poland, September 11-15, 2005, Proceedings, Part II , 2005, International Conference on Artificial Neural Networks.

[22]  Amaury Lendasse,et al.  A Methodology for Building Regression Models using Extreme Learning Machine: OP-ELM , 2008, ESANN.

[23]  Michel Verleysen,et al.  Model Selection with Cross-Validations and Bootstraps - Application to Time Series Prediction with RBFN Models , 2003, ICANN.

[24]  Guang-Bin Huang,et al.  Convex incremental extreme learning machine , 2007, Neurocomputing.

[25]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[26]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[27]  Chee Kheong Siew,et al.  Can threshold networks be trained directly? , 2006, IEEE Transactions on Circuits and Systems II: Express Briefs.

[28]  Amaury Lendasse,et al.  OP-ELM: Optimally Pruned Extreme Learning Machine , 2010, IEEE Transactions on Neural Networks.

[29]  Narasimhan Sundararajan,et al.  Fully complex extreme learning machine , 2005, Neurocomputing.

[30]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[31]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[32]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[33]  Michel Verleysen,et al.  Ensemble Modeling with a Constrained Linear System of Leave-One-Out Outputs , 2010, ESANN.

[34]  Timo Similä,et al.  Multiresponse Sparse Regression with Application to Multidimensional Scaling , 2005, ICANN.

[35]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[36]  Robert K. L. Gay,et al.  Error Minimized Extreme Learning Machine With Growth of Hidden Nodes and Incremental Learning , 2009, IEEE Transactions on Neural Networks.

[37]  J. Berger Minimax estimation of a multivariate normal mean under arbitrary quadratic loss , 1976 .