Sparse kernel minimum squared error using Householder transformation and givens rotation

Two obvious limitations exist for baseline kernel minimum squared error (KMSE): lack of sparseness of the solution and the ill-posed problem. Previous sparse methods for KMSE have overcome the second limitation using a regularization strategy, which introduces an increase in the computational cost to determine the regularization parameter. Hence, in this paper, a constructive sparse algorithm for KMSE (CS-KMSE) and its improved version (ICS-KMSE) are proposed which will simultaneously address the two limitations described above. CS-KMSE chooses the training samples that incur the largest reductions on the objective function as the significant nodes on the basis of the Householder transformation. In contrast with CS-KMSE, there is an additional replacement mechanism using Givens rotation in ICS-KMSE, which results in ICS-KMSE giving better performance than CS-KMSE in terms of sparseness. CS-KMSE and ICS-KMSE do not require the regularization parameter at all before they begin to choose significant nodes, which is beneficial since it saves on the model selection time. More importantly, CS-KMSE and ICS-KMSE terminate their procedures with an early stopping strategy that acts as an implicit regularization term, which avoids overfitting and curbs the sparse level on the solution of the baseline KMSE. Finally, in comparison with other algorithms, both ICS-KMSE and CS-KMSE have superior sparseness, and extensive comparisons confirm their effectiveness and feasibility.

[1]  Qin Li,et al.  Improvement of the kernel minimum squared error model for fast feature extraction , 2012, Neural Computing and Applications.

[2]  Alston S. Householder,et al.  Unitary Triangularization of a Nonsymmetric Matrix , 1958, JACM.

[3]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[4]  Gérard Bloch,et al.  Ho-Kashyap classifier with early stopping for regularization , 2006, Pattern Recognit. Lett..

[5]  Xian-Da Zhang,et al.  Matrix Analysis and Applications , 2017 .

[6]  Yong-Ping Zhao,et al.  Fast cross validation for regularized extreme learning machine , 2014 .

[7]  W. Givens Computation of Plain Unitary Rotations Transforming a General Matrix to Triangular Form , 1958 .

[8]  Yong-Ping Zhao,et al.  Parsimonious kernel extreme learning machine in primal via Cholesky factorization , 2016, Neural Networks.

[9]  Augustin A. Dubrulle Householder Transformations Revisited , 2000, SIAM J. Matrix Anal. Appl..

[10]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[11]  Qi Zhu Reformative nonlinear feature extraction using kernel MSE , 2010, Neurocomputing.

[12]  Xuegong Zhang,et al.  Kernel MSE algorithm: a unified framework for KFD, LS-SVM and KRR , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[13]  Fadi Dornaika,et al.  Kernel flexible manifold embedding for pattern classification , 2015, Neurocomputing.

[14]  Fu Li,et al.  A pruning method of refining recursive reduced least squares support vector regression , 2015, Inf. Sci..

[15]  Jianguo Sun,et al.  Recursive reduced least squares support vector regression , 2009, Pattern Recognit..

[16]  Sun Jian-guo Thrust estimator design based on least squares support vector regression machine , 2010 .

[17]  Haibo Zhang,et al.  A fast method of feature extraction for kernel MSE , 2011, Neurocomputing.

[18]  Johan A. K. Suykens,et al.  Sparse approximation using least squares support vector machines , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[19]  Qi Zhu A Method for Rapid Feature Extraction Based on KMSE , 2009, 2009 WRI Global Congress on Intelligent Systems.

[20]  Haibo Zhang,et al.  Pruning least objective contribution in KMSE , 2011, Neurocomputing.

[21]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22]  赵永平,et al.  Thrust estimator design based on least squares support vector regression machine , 2010 .

[23]  V. A. Morozov,et al.  Methods for Solving Incorrectly Posed Problems , 1984 .

[24]  David Zhang,et al.  A fast kernel-based nonlinear discriminant analysis for multi-class problems , 2006, Pattern Recognit..

[25]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[26]  Simon Haykin,et al.  On Different Facets of Regularization Theory , 2002, Neural Computation.

[27]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[28]  Dong Liang,et al.  A method of combining forward with backward greedy algorithms for sparse approximation to KMSE , 2017, Soft Comput..

[29]  Jie Liu,et al.  Incremental kernel minimum squared error (KMSE) , 2014, Inf. Sci..

[30]  Sheng Chen,et al.  Orthogonal least squares methods and their application to non-linear system identification , 1989 .

[31]  Yong-Ping Zhao,et al.  Online independent reduced least squares support vector regression , 2012, Inf. Sci..

[32]  Senjian An,et al.  Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression , 2007, Pattern Recognit..

[33]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[34]  Qi Zhu,et al.  A method for constructing simplified kernel model based on kernel-MSE , 2009, 2009 Asia-Pacific Conference on Computational Intelligence and Industrial Applications (PACIIA).

[35]  Xi Chen,et al.  Sparsity Based Feature Extraction for Kernel Minimum Squared Error , 2014, CCPR.

[36]  Licheng Jiao,et al.  Fast Sparse Approximation for Least Squares Support Vector Machine , 2007, IEEE Transactions on Neural Networks.

[37]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[38]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[39]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[40]  Jing-Yu Yang,et al.  An efficient kernel-based nonlinear regression method for two-class classification , 2005, 2005 International Conference on Machine Learning and Cybernetics.