1-Norm extreme learning machine for regression and multiclass classification using Newton method

In this paper, a novel 1-norm extreme learning machine (ELM) for regression and multiclass classification is proposed as a linear programming problem whose solution is obtained by solving its dual exterior penalty problem as an unconstrained minimization problem using a fast Newton method. The algorithm converges from any starting point and can be easily implemented in MATLAB. The main advantage of the proposed approach is that it leads to a sparse model representation meaning that many components of the optimal solution vector will become zero and therefore the decision function can be determined using much less number of hidden nodes in comparison to ELM. Numerical experiments were performed on a number of interesting real-world benchmark datasets and their results are compared with ELM using additive and radial basis function (RBF) hidden nodes, optimally pruned ELM (OP-ELM) and support vector machine (SVM) methods. Similar or better generalization performance of the proposed method on the test data over ELM, OP-ELM and SVM clearly illustrates its applicability and usefulness.

[1]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[2]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Guang-Bin Huang,et al.  Convex incremental extreme learning machine , 2007, Neurocomputing.

[5]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[8]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  S. Balasundaram,et al.  Application of error minimized extreme learning machine for simultaneous learning of a function and its derivatives , 2011, Neurocomputing.

[10]  George E. P. Box,et al.  Time Series Analysis: Forecasting and Control , 1977 .

[11]  Benoît Frénay,et al.  Using SVMs with randomised feature spaces: an extreme learning approach , 2010, ESANN.

[12]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[13]  Amaury Lendasse,et al.  OP-ELM: Optimally Pruned Extreme Learning Machine , 2010, IEEE Transactions on Neural Networks.

[14]  Feilong Cao,et al.  Optimization approximation solution for regression problem based on extreme learning machine , 2011, Neurocomputing.

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  F. Cao,et al.  Learning errors of linear programming support vector regression , 2011 .

[17]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[18]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[19]  J. Hiriart-Urruty,et al.  Generalized Hessian matrix and second-order optimality conditions for problems withC1,1 data , 1984 .

[20]  Hongming Zhou,et al.  Optimization method based extreme learning machine for classification , 2010, Neurocomputing.

[21]  Gunnar Rätsch,et al.  Using support vector machines for time series prediction , 1999 .

[22]  Amaury Lendasse,et al.  TROP-ELM: A double-regularized ELM using LARS and Tikhonov regularization , 2011, Neurocomputing.

[23]  A. Kai Qin,et al.  Evolutionary extreme learning machine , 2005, Pattern Recognit..

[24]  Michael D. Gordon,et al.  Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[25]  Yuan Lan,et al.  Two-stage extreme learning machine for regression , 2010, Neurocomputing.

[26]  K. S. Banerjee Generalized Inverse of Matrices and Its Applications , 1973 .

[27]  Zhixiang Chen,et al.  A modified extreme learning machine with sigmoidal activation functions , 2012, Neural Computing and Applications.

[28]  Olvi L. Mangasarian,et al.  Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization , 2006, J. Mach. Learn. Res..

[29]  Min Han,et al.  The hidden neurons selection of the wavelet networks using support vector machines and ridge regression , 2008, Neurocomputing.

[30]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[31]  Amaury Lendasse,et al.  OP-ELM: Theory, Experiments and a Toolbox , 2008, ICANN.

[32]  Lei Chen,et al.  Enhanced random search based incremental extreme learning machine , 2008, Neurocomputing.

[33]  Robert K. L. Gay,et al.  Error Minimized Extreme Learning Machine With Growth of Hidden Nodes and Incremental Learning , 2009, IEEE Transactions on Neural Networks.

[34]  Li Zhang,et al.  On the sparseness of 1-norm support vector machines , 2010, Neural Networks.

[35]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[36]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[37]  Manfred K. Warmuth,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[38]  Glenn Fung,et al.  Finite Newton method for Lagrangian support vector machine classification , 2003, Neurocomputing.