An Inverse-Free and Scalable Sparse Bayesian Extreme Learning Machine for Classification Problems

Sparse Bayesian Extreme Learning Machine (SBELM) constructs an extremely sparse and probabilistic model with low computational cost and high generalization. However, the update rule of hyperparameters (ARD prior) in SBELM involves using the diagonal elements from the inversion of the covariance matrix with the full training dataset, which raises the following two issues. Firstly, inverting the Hessian matrix may suffer ill-conditioning issues in some cases, which hinders SBELM from converging. Secondly, it may result in the memory-overflow issue with computational memory <inline-formula> <tex-math notation="LaTeX">$O(L^{3})$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$L$ </tex-math></inline-formula>: number of hidden nodes) to invert the big covariance matrix for updating the ARD priors. To address these issues, an inverse-free SBELM called QN-SBELM is proposed in this paper, which integrates the gradient-based Quasi-Newton (QN) method into SBELM to approximate the inverse covariance matrix. It takes <inline-formula> <tex-math notation="LaTeX">$O(L^{2})$ </tex-math></inline-formula> computational complexity and is simultaneously scalable to large problems. QN-SBELM was evaluated on benchmark datasets of different sizes. Experimental results verify that QN-SBELM achieves more accurate results than SBELM with a sparser model, and also provides more stable solutions and a great extension to large-scale problems.

[1]  Robert K. L. Gay,et al.  Error Minimized Extreme Learning Machine With Growth of Hidden Nodes and Incremental Learning , 2009, IEEE Transactions on Neural Networks.

[2]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[3]  Xizhao Wang,et al.  Non-iterative approaches in training feed-forward neural networks and their applications , 2018, Soft Computing.

[4]  Amaury Lendasse,et al.  OP-ELM: Optimally Pruned Extreme Learning Machine , 2010, IEEE Transactions on Neural Networks.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[7]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[8]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Chi-Man Vong,et al.  Efficient extreme learning machine via very sparse random projection , 2018, Soft Comput..

[10]  Jie Du,et al.  Postboosting Using Extended G-Mean for Online Sequential Multiclass Imbalance Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[11]  George Cybenko,et al.  Ill-Conditioning in Neural Network Training Problems , 1993, SIAM J. Sci. Comput..

[12]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[13]  Dean P. Foster,et al.  Faster Ridge Regression via the Subsampled Randomized Hadamard Transform , 2013, NIPS.

[14]  Chi-Man Vong,et al.  Extreme semi-supervised learning for multiclass classification , 2020, Neurocomputing.

[15]  Chi-Man Vong,et al.  Multinomial Bayesian extreme learning machine for sparse and accurate classification model , 2021, Neurocomputing.

[16]  Chi-Man Vong,et al.  Accurate and efficient sequential ensemble learning for highly imbalanced multi-class data , 2020, Neural Networks.

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  Chi-Man Vong,et al.  Sparse Bayesian Extreme Learning Machine for Multi-classification , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[19]  ZhangRui,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012 .

[20]  David J. C. MacKay,et al.  Bayesian Methods for Backpropagation Networks , 1996 .

[21]  Chi-Man Vong,et al.  A New Framework of Simultaneous-Fault Diagnosis Using Pairwise Probabilistic Multi-Label Classification for Time-Dependent Patterns , 2013, IEEE Transactions on Industrial Electronics.

[22]  Cheng Wu,et al.  Discriminative clustering via extreme learning machine , 2015, Neural Networks.

[23]  Michael E. Tipping,et al.  Analysis of Sparse Bayesian Learning , 2001, NIPS.

[24]  Dmitriy Fradkin,et al.  Experiments with random projections for machine learning , 2003, KDD '03.

[25]  Min Wang,et al.  Online Support Vector Machine Based on Convex Hull Vertices Selection , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[26]  D. Shanno Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[27]  Xizhao Wang,et al.  A review on neural networks with random weights , 2018, Neurocomputing.

[28]  Cheng Wu,et al.  Semi-Supervised and Unsupervised Extreme Learning Machines , 2014, IEEE Transactions on Cybernetics.

[29]  J. J. Moré,et al.  Quasi-Newton Methods, Motivation and Theory , 1974 .