A Robust Extreme Learning Machine for pattern classification with outliers

In this paper we introduce a simple and efficient extension of the Extreme Learning Machine (ELM) network (Huang et al., 2006 19), which is very robust to label noise, a type of outlier occurring in classification tasks. Such outliers usually result from mistakes during labeling of the data points (e.g. misjudgment of a specialist) or from typing errors during creation of data files (e.g. by striking an incorrect key on a keyboard). The proposed variant of the ELM, henceforth named Robust ELM (RELM), is designed using M-estimators to compute the output weights instead of the standard ordinary least squares (OLS) method. We evaluate the performance of the RELM using batch and recursive learning rules, and also introduce a model selection strategy based on Particle Swarm Optimization (PSO) to find an optimal architecture for datasets contaminated with non-Gaussian noise and outliers. By means of comprehensive computer simulations using synthetic and real-world datasets, we show that the proposed Robust ELM classifiers consistently outperforms the original version.

[1]  D. Calvetti,et al.  Tikhonov regularization and the L-curve for large discrete ill-posed problems , 2000 .

[2]  Bernard Widrow,et al.  Statistical efficiency of adaptive algorithms , 2003, Neural Networks.

[3]  Hyun-Chul Kim,et al.  Outlier Robust Gaussian Process Classification , 2008, SSPR/SPR.

[4]  Guang-Bin Huang,et al.  Face recognition based on extreme learning machine , 2011, Neurocomputing.

[5]  G. Warnock,et al.  Thinking About Thinking , 1975 .

[6]  Amaury Lendasse,et al.  TROP-ELM: A double-regularized ELM using LARS and Tikhonov regularization , 2011, Neurocomputing.

[7]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[8]  Klaus Neumann,et al.  Optimizing extreme learning machines via ridge regression and batch intrinsic plasticity , 2013, Neurocomputing.

[9]  Hilary Buxton,et al.  Invariance in radial basis function neural networks in human face classification , 1995, Neural Processing Letters.

[10]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[11]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[13]  Yuan Lan,et al.  Ensemble of online sequential extreme learning machine , 2009, Neurocomputing.

[14]  Angelika Foerster An R And S Plus Companion To Applied Regression , 2016 .

[15]  Bernard Widrow,et al.  The No-Prop algorithm: A new learning algorithm for multilayer neural networks , 2013, Neural Networks.

[16]  V. Kvasnicka,et al.  Neural and Adaptive Systems: Fundamentals Through Simulations , 2001, IEEE Trans. Neural Networks.

[17]  Qinghua Zheng,et al.  Regularized Extreme Learning Machine , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[18]  Bernard Carlos Widrow,et al.  Thinking about thinking: the discovery of the LMS algorithm , 2005, IEEE Signal Process. Mag..

[19]  Narasimhan Sundararajan,et al.  A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks , 2006, IEEE Transactions on Neural Networks.

[20]  Q. M. Jonathan Wu,et al.  Human face recognition based on multidimensional PCA and extreme learning machine , 2011, Pattern Recognit..

[21]  Punyaphol Horata,et al.  Robust extreme learning machine , 2013, Neurocomputing.

[22]  Chien-Cheng Lee,et al.  Noisy time series prediction using M-estimator based robust radial basis function neural networks with growing and pruning techniques , 2009, Expert Syst. Appl..

[23]  J. Stevens,et al.  Outliers and influential data points in regression analysis. , 1984 .

[24]  Andries P. Engelbrecht,et al.  Computational Intelligence: An Introduction , 2002 .

[25]  WidrowBernard,et al.  The No-Prop algorithm , 2013 .

[26]  Golub Gene H. Et.Al Matrix Computations, 3rd Edition , 2007 .

[27]  Shing-Chow Chan,et al.  On the Performance Analysis of the Least Mean M-Estimate and Normalized Least Mean M-Estimate Algorithms with Gaussian Inputs and Additive Gaussian and Contaminated Gaussian Noises , 2010, J. Signal Process. Syst..

[28]  Omar Ayad Learning under Concept Drift with Support Vector Machines , 2014, ICANN.

[29]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[30]  Dale Borowiak,et al.  Linear Models, Least Squares and Alternatives , 2001, Technometrics.

[31]  Guang-Bin Huang,et al.  An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels , 2014, Cognitive Computation.

[32]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[33]  J KriegmanDavid,et al.  Eigenfaces vs. Fisherfaces , 1997 .

[34]  Amaury Lendasse,et al.  Finding Originally Mislabels with MD-ELM , 2014, ESANN.

[35]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[36]  Andrew J. Chipperfield,et al.  Simplifying Particle Swarm Optimization , 2010, Appl. Soft Comput..

[37]  Amaury Lendasse,et al.  OP-ELM: Optimally Pruned Extreme Learning Machine , 2010, IEEE Transactions on Neural Networks.

[38]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[39]  James Kennedy,et al.  Defining a Standard for Particle Swarm Optimization , 2007, 2007 IEEE Swarm Intelligence Symposium.

[40]  Shalabh,et al.  Linear Models and Generalizations: Least Squares and Alternatives , 2007 .

[41]  A. Kai Qin,et al.  Evolutionary extreme learning machine , 2005, Pattern Recognit..

[42]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[43]  Z. D. Bai,et al.  GeneralM-Estimation , 1997 .

[44]  Chein-I Chang,et al.  Robust radial basis function neural networks , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[45]  Yuehua Wu,et al.  General M-estimation , 1997 .

[46]  WuXindong,et al.  Class noise vs. attribute noise , 2004 .

[47]  Jun Wang,et al.  Chaotic Time Series Prediction Based on a Novel Robust Echo State Network , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[48]  Han Wang,et al.  Ensemble Based Extreme Learning Machine , 2010, IEEE Signal Processing Letters.

[49]  Qinyu. Zhu Extreme Learning Machine , 2013 .

[50]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[51]  Tung-Sang Ng,et al.  Fast least mean M-estimate algorithms for robust adaptive filtering in impulse noise , 2000, 2000 10th European Signal Processing Conference.

[52]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.