Pruning least objective contribution in KMSE

Although kernel minimum squared error (KMSE) is computationally simple, i.e., it only needs solving a linear equation set, it suffers from the drawback that in the testing phase the computational efficiency decreases seriously as the training samples increase. The underlying reason is that the solution of Naive KMSE is represented by all the training samples in the feature space. Hence, in this paper, a method of selecting significant nodes for KMSE is proposed. During each calculation round, the presented algorithm prunes the training sample making least contribution to the objective function, hence called as PLOC-KMSE. To accelerate the training procedure, a batch of so-called nonsignificant nodes is pruned instead of one by one in PLOC-KMSE, and this speedup algorithm is named MPLOC-KMSE for short. To show the efficacy and feasibility of the proposed PLOC-KMSE and MPLOC-KMSE, the experiments on benchmark data sets and real-world instances are reported. The experimental results demonstrate that PLOC-KMSE and MPLOC-KMSE require the fewest significant nodes compared with other algorithms. That is to say, their computational efficiency in the testing phase is best, thus suitable for environments having a strict demand of computational efficiency. In addition, from the performed experiments, it is easily known that the proposed MPLOC-KMSE accelerates the training procedure without sacrificing the computational efficiency of testing phase to reach the almost same generalization performance. Finally, although PLOC and MPLOC are proposed in regression domain, they can be easily extended to classification problem and other algorithms such as kernel ridge regression.

[1]  David Zhang,et al.  A fast kernel-based nonlinear discriminant analysis for multi-class problems , 2006, Pattern Recognit..

[2]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[3]  S. Keerthi,et al.  SMO Algorithm for Least-Squares SVM Formulations , 2003, Neural Computation.

[4]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6]  Yong-Ping Zhao,et al.  Multikernel semiparametric linear programming support vector regression , 2011, Expert Syst. Appl..

[7]  Licheng Jiao,et al.  Sparse Kernel Ridge Regression Using Backward Deletion , 2006, PRICAI.

[8]  Jing-Yu Yang,et al.  An efficient kernel-based nonlinear regression method for two-class classification , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[9]  S. A. Billings,et al.  The identification of linear and non-linear models of a turbocharged automotive diesel engine , 1989 .

[10]  Xin Yao,et al.  Sparse Approximation Through Boosting for Learning Large Scale Kernel Machines , 2010, IEEE Transactions on Neural Networks.

[11]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[12]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[13]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[14]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Qi Zhu Reformative nonlinear feature extraction using kernel MSE , 2010, Neurocomputing.

[16]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[17]  Xuegong Zhang,et al.  Kernel MSE algorithm: a unified framework for KFD, LS-SVM and KRR , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[18]  Johan A. K. Suykens,et al.  Least squares support vector machine classifiers: a large scale algorithm , 1999 .

[19]  Licheng Jiao,et al.  Fast Sparse Approximation for Least Squares Support Vector Machine , 2007, IEEE Transactions on Neural Networks.

[20]  Daming Shi,et al.  Significant vector learning to construct sparse kernel regression models , 2007, Neural Networks.

[21]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[23]  J. Andrew Bagnell,et al.  Kernel Conjugate Gradient for Fast Kernel Machines , 2007, IJCAI.

[24]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[25]  Qi Zhu,et al.  A method for constructing simplified kernel model based on kernel-MSE , 2009, 2009 Asia-Pacific Conference on Computational Intelligence and Industrial Applications (PACIIA).

[26]  Andy J. Keane,et al.  Some Greedy Learning Algorithms for Sparse Regression and Classification with Mercer Kernels , 2003, J. Mach. Learn. Res..

[27]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[28]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[29]  Bing Lam Luk,et al.  Orthogonal-least-squares regression: A unified approach for data modelling , 2009, Neurocomputing.

[30]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[31]  Yoram Bresler,et al.  On the Optimality of the Backward Greedy Algorithm for the Subset Selection Problem , 2000, SIAM J. Matrix Anal. Appl..

[32]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[33]  Theo J. A. de Vries,et al.  Pruning error minimization in least squares support vector machines , 2003, IEEE Trans. Neural Networks.