An Information Theoretic Approach of Designing Sparse Kernel Adaptive Filters

This paper discusses an information theoretic approach of designing sparse kernel adaptive filters. To determine useful data to be learned and remove redundant ones, a subjective information measure called surprise is introduced. Surprise captures the amount of information a datum contains which is transferable to a learning system. Based on this concept, we propose a systematic sparsification scheme, which can drastically reduce the time and space complexity without harming the performance of kernel adaptive filters. Nonlinear regression, short term chaotic time-series prediction, and long term time-series forecasting examples are presented.

[1]  F. Girosi,et al.  Nonlinear prediction of chaotic time series using support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[2]  Wei-Min Shen,et al.  Surprise-Based Learning for Developmental Robotics , 2008, 2008 ECSIS Symposium on Learning and Adaptive Behaviors for Robotic Systems (LAB-RS).

[3]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[4]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  S. Haykin,et al.  Kernel Least‐Mean‐Square Algorithm , 2010 .

[7]  David G. Stork,et al.  Pattern Classification , 1973 .

[8]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[9]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[10]  Weifeng Liu,et al.  Kernel Affine Projection Algorithms , 2008, EURASIP J. Adv. Signal Process..

[11]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[12]  M. Opper Sparse Online Gaussian Processes , 2008 .

[13]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[14]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[15]  Martial Hebert,et al.  Active Learning For Outdoor Obstacle Detection , 2005, Robotics: Science and Systems.

[16]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[17]  F. Takens Detecting strange attractors in turbulence , 1981 .

[18]  Weifeng Liu,et al.  Extended Kernel Recursive Least Squares Algorithm , 2009, IEEE Transactions on Signal Processing.

[19]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[20]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[21]  E. Pfaffelhuber Learning and information theory. , 1972, The International journal of neuroscience.

[22]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[23]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[24]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[25]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[26]  Koby Crammer,et al.  Online Classification on a Budget , 2003, NIPS.

[27]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[28]  Paul Honeine,et al.  Online Prediction of Time Series Data With Kernels , 2009, IEEE Transactions on Signal Processing.

[29]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[30]  From Clocks to Chaos: The Rhythms of Life , 1988 .

[31]  Johan A. K. Suykens,et al.  Subset based least squares subspace regression in RKHS , 2005, Neurocomputing.

[32]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[33]  John C. Platt A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.

[34]  P. Bosman,et al.  Negative log-likelihood and statistical hypothesis testing as the basis of model selection in IDEAs , 2000 .

[35]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[36]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[37]  Sergios Theodoridis,et al.  Sliding Window Generalized Kernel Affine Projection Algorithm Using Projection Mappings , 2008, EURASIP J. Adv. Signal Process..

[38]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[39]  G. Palm Evidence, information, and surprise , 1981, Biological Cybernetics.

[40]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[41]  Joaquin Quiñonero Candela,et al.  Incremental Gaussian Processes , 2002, NIPS.