The Kernel Least-Mean-Square Algorithm

The combination of the famed kernel trick and the least-mean-square (LMS) algorithm provides an interesting sample-by-sample update for an adaptive filter in reproducing kernel Hilbert spaces (RKHS), which is named in this paper the KLMS. Unlike the accepted view in kernel methods, this paper shows that in the finite training data case, the KLMS algorithm is well posed in RKHS without the addition of an extra regularization term to penalize solution norms as was suggested by Kivinen [Kivinen, Smola and Williamson, ldquoOnline Learning With Kernels,rdquo IEEE Transactions on Signal Processing, vol. 52, no. 8, pp. 2165-2176, Aug. 2004] and Smale [Smale and Yao, ldquoOnline Learning Algorithms,rdquo Foundations in Computational Mathematics, vol. 6, no. 2, pp. 145-176, 2006]. This result is the main contribution of the paper and enhances the present understanding of the LMS algorithm with a machine learning perspective. The effect of the KLMS step size is also studied from the viewpoint of regularization. Two experiments are presented to support our conclusion that with finite data the KLMS algorithm can be readily used in high dimensional spaces and particularly in RKHS to derive nonlinear, stable algorithms with comparable performance to batch, regularized solutions.

[1]  Arnold Neumaier,et al.  Solving Ill-Conditioned and Singular Linear Systems: A Tutorial on Regularization , 1998, SIAM Rev..

[2]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[3]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation (3rd Edition) , 2007 .

[4]  Alexander Rakhlin,et al.  Stability Properties of Empirical Risk Minimization over Donsker Classes , 2006, J. Mach. Learn. Res..

[5]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[6]  Bernhard Schölkopf,et al.  Iterative kernel principal component analysis for image modeling , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Andreu Català,et al.  A Comparison between the Tikhonov and the Bayesian Approaches to Calculate Regularisation Matrices , 2004, Neural Processing Letters.

[8]  Tong Zhang,et al.  Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.

[9]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[10]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[11]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[12]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[13]  John C. Platt A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.

[14]  C. Vogel Nonsmooth Regularization , 1997 .

[15]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[16]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[17]  Gene H. Golub,et al.  Regularization by Truncated Total Least Squares , 1997, SIAM J. Sci. Comput..

[18]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[19]  G. Stewart Introduction to matrix computations , 1973 .

[20]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[21]  Weifeng Liu,et al.  Kernel LMS , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[22]  J. Hadamard Sur les problemes aux derive espartielles et leur signification physique , 1902 .

[23]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[24]  Paulo Sergio Ramirez,et al.  Fundamentals of Adaptive Filtering , 2002 .

[25]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[26]  Robert F. Harrison,et al.  A kernel based adaline , 1999, ESANN.

[27]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[28]  Charles A. Micchelli,et al.  Learning Convex Combinations of Continuously Parameterized Basic Kernels , 2005, COLT.

[29]  Yuan Yao,et al.  Online Learning Algorithms , 2006, Found. Comput. Math..

[31]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[32]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[33]  Bernhard Schölkopf,et al.  A Unifying View of Wiener and Volterra Theory and Polynomial Kernel Regression , 2006, Neural Computation.