Is Extreme Learning Machine Feasible? A Theoretical Assessment (Part I)

An extreme learning machine (ELM) is a feedforward neural network (FNN) like learning system whose connections with output neurons are adjustable, while the connections with and within hidden neurons are randomly fixed. Numerous applications have demonstrated the feasibility and high efficiency of ELM-like systems. It has, however, been open if this is true for any general applications. In this two-part paper, we conduct a comprehensive feasibility analysis of ELM. In Part I, we provide an answer to the question by theoretically justifying the following: 1) for some suitable activation functions, such as polynomials, Nadaraya-Watson and sigmoid functions, the ELM-like systems can attain the theoretical generalization bound of the FNNs with all connections adjusted, i.e., they do not degrade the generalization capability of the FNNs even when the connections with and within hidden neurons are randomly fixed; 2) the number of hidden neurons needed for an ELM-like system to achieve the theoretical bound can be estimated; and 3) whenever the activation function is taken as polynomial, the deduced hidden layer output matrix is of full column-rank, therefore the generalized inverse technique can be efficiently applied to yield the solution of an ELM-like system, and, furthermore, for the nonpolynomial case, the Tikhonov regularization can be applied to guarantee the weak regularity while not sacrificing the generalization capability. In Part II, however, we reveal a different aspect of the feasibility of ELM: there also exists some activation functions, which makes the corresponding ELM degrade the generalization capability. The obtained results underlie the feasibility and efficiency of ELM-like systems, and yield various generalizations and improvements of the systems as well.

[1]  P. Borwein,et al.  Polynomials and Polynomial Inequalities , 1995 .

[2]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Binu P. Chacko,et al.  Handwritten character recognition using wavelet energy and extreme learning machine , 2012, Int. J. Mach. Learn. Cybern..

[4]  Guang-Bin Huang,et al.  Convex incremental extreme learning machine , 2007, Neurocomputing.

[5]  Ingo Steinwart,et al.  Optimal learning rates for least squares SVMs using Gaussian kernels , 2011, NIPS.

[6]  Karlheinz Gröchenig,et al.  Random Sampling of Multivariate Trigonometric Polynomials , 2005, SIAM J. Math. Anal..

[7]  Benoît Frénay,et al.  Using SVMs with randomised feature spaces: an extreme learning approach , 2010, ESANN.

[8]  Q. M. Jonathan Wu,et al.  Human face recognition based on multidimensional PCA and extreme learning machine , 2011, Pattern Recognit..

[9]  Chen Xu,et al.  Does generalization performance of lq regularization learning depend on q? A negative example , 2013, ArXiv.

[10]  Yonggwan Won,et al.  An Improvement of Extreme Learning Machine for Compact Single-Hidden-Layer Feedforward Neural Networks , 2008, Int. J. Neural Syst..

[11]  Ronald A. DeVore,et al.  Approximation Methods for Supervised Learning , 2006, Found. Comput. Math..

[12]  Hongming Zhou,et al.  Extreme Learning Machine based fast object recognition , 2012, 2012 15th International Conference on Information Fusion.

[13]  Yew-Soon Ong,et al.  Extreme learning machine for multi-categories classification applications , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[14]  Hongzhi Tong,et al.  Least Square Regression with lp-Coefficient Regularization , 2010, Neural Computation.

[15]  A. Kai Qin,et al.  Evolutionary extreme learning machine , 2005, Pattern Recognit..

[16]  Ding-Xuan Zhou,et al.  Concentration estimates for learning with ℓ1-regularizer and data dependent hypothesis spaces , 2011 .

[17]  Benoît Frénay,et al.  Parameter-insensitive kernel in extreme learning for non-linear support vector regression , 2011, Neurocomputing.

[18]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[19]  Zhi-Zhong Mao,et al.  An Ensemble ELM Based on Modified AdaBoost.RT Algorithm for Predicting the Temperature of Molten Steel in Ladle Furnace , 2010, IEEE Transactions on Automation Science and Engineering.

[20]  Vitaly Maiorov,et al.  Approximation by neural networks and learning theory , 2006, J. Complex..

[21]  Zongben Xu,et al.  Universal Approximation of Extreme Learning Machine With Adaptive Growth of Hidden Nodes , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Kurt Jetter,et al.  Approximation with polynomial kernels and SVM classifiers , 2006, Adv. Comput. Math..

[23]  Qing He,et al.  Extreme Support Vector Machine Classifier , 2008, PAKDD.

[24]  Hong Chen,et al.  Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems , 1995, IEEE Trans. Neural Networks.

[25]  Qiang Wu,et al.  Least square regression with indefinite kernels and coefficient regularization , 2011 .

[26]  Ding-Xuan Zhou,et al.  Learning with sample dependent hypothesis spaces , 2008, Comput. Math. Appl..

[27]  Yoh-Han Pao,et al.  Adaptive pattern recognition and neural networks , 1989 .

[28]  Hrushikesh Narhar Mhaskar,et al.  Approximation properties of zonal function networks using scattered data on the sphere , 1999, Adv. Comput. Math..

[29]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[30]  D. Serre Matrices: Theory and Applications , 2002 .

[31]  Qinghua Zheng,et al.  Regularized Extreme Learning Machine , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[32]  Narasimhan Sundararajan,et al.  ICGA-PSO-ELM Approach for Accurate Multiclass Cancer Classification Resulting in Reduced Gene Sets in Which Genes Encoding Secreted Proteins Are Highly Represented , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  K. S. Banerjee Generalized Inverse of Matrices and Its Applications , 1973 .

[34]  Tamás Erdélyi Bernstein‐Type Inequalities for Linear Combinations of Shifted Gaussians , 2006 .

[35]  Martin T. Hagan,et al.  Neural network design , 1995 .

[36]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[37]  Hrushikesh Narhar Mhaskar,et al.  Spherical Marcinkiewicz-Zygmund inequalities and positive quadrature , 2001, Math. Comput..

[38]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[39]  Xia Liu,et al.  Almost optimal estimates for approximation and learning by radial basis function networks , 2013, Machine Learning.

[40]  Hongming Zhou,et al.  Optimization method based extreme learning machine for classification , 2010, Neurocomputing.

[41]  Jiwen Lu,et al.  Palmprint recognition via Locality Preserving Projections and extreme learning machine neural network , 2008, 2008 9th International Conference on Signal Processing.

[42]  Zongben Xu,et al.  Learning Rates of lq Coefficient Regularization Learning with Gaussian Kernel , 2013, Neural Computation.

[43]  Chee Kheong Siew,et al.  Can threshold networks be trained directly? , 2006, IEEE Transactions on Circuits and Systems II: Express Briefs.

[44]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[45]  Amaury Lendasse,et al.  Interpreting Extreme Learning Machine as an Approximation to an Infinite Neural Network , 2010, KDIR.

[46]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[47]  Antonio J. Serrano,et al.  BELM: Bayesian Extreme Learning Machine , 2011, IEEE Transactions on Neural Networks.

[48]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[49]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[50]  V. Piuri,et al.  Illuminance prediction through Extreme Learning Machines , 2012, 2012 IEEE Workshop on Environmental Energy and Structural Monitoring Systems (EESMS).

[51]  Amaury Lendasse,et al.  OP-ELM: Theory, Experiments and a Toolbox , 2008, ICANN.

[52]  Ingo Steinwart,et al.  Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[53]  Yonggwan Won,et al.  Regularized online sequential learning algorithm for single-hidden layer feedforward neural networks , 2011, Pattern Recognit. Lett..

[54]  F. Cao,et al.  The rate of approximation of Gaussian radial basis neural networks in continuous function space , 2013 .

[55]  Manuel Graña,et al.  Face recognition with lattice independent component analysis and extreme learning machines , 2012, Soft Comput..

[56]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[57]  Tianyou Chai,et al.  Predicting mill load using partial least squares and extreme learning machines , 2012, Soft Comput..