Online Nonlinear Estimation via Iterative $L^2$ -Space Projections: Reproducing Kernel of Subspace

We propose a novel online learning paradigm for nonlinear-function estimation tasks based on the iterative projections in the <inline-formula><tex-math notation="LaTeX">$L^2$</tex-math></inline-formula> space with probability measure reflecting the stochastic property of input signals. The proposed learning algorithm exploits the reproducing kernel of the so-called dictionary subspace, based on the fact that any finite-dimensional space of functions has a reproducing kernel characterized by the Gram matrix. The <inline-formula><tex-math notation="LaTeX">$L^2$</tex-math> </inline-formula>-space geometry provides the best decorrelation property in principle. The proposed learning paradigm is significantly different from the conventional kernel-based learning paradigm in two senses: first, the whole space is <italic>not</italic> a reproducing kernel Hilbert space; and second, the minimum mean squared error estimator gives the best approximation of the desired nonlinear function in the dictionary subspace. It preserves efficiency in computing the inner product as well as in updating the Gram matrix when the dictionary grows. Monotone approximation, asymptotic optimality, and convergence of the proposed algorithm are analyzed based on the variable-metric version of adaptive projected subgradient method. Numerical examples show the efficacy of the proposed algorithm for real data over a variety of methods including the extended Kalman filter and many batch machine-learning methods such as the multilayer perceptron.

[1]  J. Nagumo,et al.  A learning method for system identification , 1967, IEEE Transactions on Automatic Control.

[2]  Masahiro Yukawa,et al.  Adaptive Learning with Reproducing Kernels , .

[3]  Franziska Wulf,et al.  Minimization Methods For Non Differentiable Functions , 2016 .

[4]  I. Yamada,et al.  Pairwise Optimal Weight Realization—Acceleration Technique for Set-Theoretic Adaptive Parallel Subgradient Projection Algorithm , 2006, IEEE Transactions on Signal Processing.

[5]  Masahiro Yukawa,et al.  Efficient Dictionary-Refining Kernel Adaptive Filter With Fundamental Insights , 2016, IEEE Transactions on Signal Processing.

[6]  Paulo Sergio Ramirez,et al.  Fundamentals of Adaptive Filtering , 2002 .

[7]  Paul Honeine,et al.  Online Prediction of Time Series Data With Kernels , 2009, IEEE Transactions on Signal Processing.

[8]  Donald L. Duttweiler,et al.  Proportionate normalized least-mean-squares adaptation in echo cancelers , 2000, IEEE Trans. Speech Audio Process..

[9]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Masahiro Yukawa,et al.  Online learning in L2 space with multiple Gaussian kernels , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[12]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[13]  Jean-Philippe Vert,et al.  Consistency and Convergence Rates of One-Class SVMs and Related Algorithms , 2006, J. Mach. Learn. Res..

[14]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[15]  Isaac Skog,et al.  Time Synchronization Errors in Loosely Coupled GPS-Aided Inertial Navigation Systems , 2011, IEEE Transactions on Intelligent Transportation Systems.

[16]  Masahiro Yukawa,et al.  An efficient sparse kernel adaptive filtering algorithm based on isomorphism between functional subspace and Euclidean space , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Manfred Opper,et al.  Sparse Representation for Gaussian Process Models , 2000, NIPS.

[18]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[19]  Badong Chen,et al.  Self-organizing kernel adaptive filtering , 2016, EURASIP J. Adv. Signal Process..

[20]  Masahiro Yukawa,et al.  Krylov-Proportionate Adaptive Filtering Techniques Not Limited to Sparse Systems , 2009, IEEE Transactions on Signal Processing.

[21]  Isao Yamada,et al.  Multi-Domain Adaptive Learning Based on Feasibility Splitting and Adaptive Projected Subgradient Method , 2010, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[22]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[23]  Masahiro Yukawa,et al.  Adaptive Nonlinear Estimation Based on Parallel Projection Along Affine Subspaces in Reproducing Kernel Hilbert Space , 2015, IEEE Transactions on Signal Processing.

[24]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[25]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[26]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[27]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[28]  Naum Zuselevich Shor,et al.  Minimization Methods for Non-Differentiable Functions , 1985, Springer Series in Computational Mathematics.

[29]  LI X.RONG,et al.  Survey of maneuvering target tracking. Part I. Dynamic models , 2003 .

[30]  Pınar Tüfekci,et al.  Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods , 2014 .

[31]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[32]  I. Yamada,et al.  Adaptive Projected Subgradient Method for Asymptotic Minimization of Sequence of Nonnegative Convex Functions , 2005 .

[33]  Isao Yamada,et al.  A Unified View of Adaptive Variable-Metric Projection Algorithms , 2009, EURASIP J. Adv. Signal Process..

[34]  José Carlos Príncipe,et al.  Mixture kernel least mean square , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[35]  C. K. Michael Tse,et al.  Kernel Least Mean Square with Single Feedback , 2015, IEEE Signal Processing Letters.

[36]  Yih-Fang Huang,et al.  Kernelized set-membership approach to nonlinear adaptive filtering , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[37]  S. Haykin,et al.  Kernel Least‐Mean‐Square Algorithm , 2010 .

[38]  T. Hinamoto,et al.  Extended theory of learning identification , 1975 .

[39]  Isao Yamada,et al.  An efficient robust adaptive filtering algorithm based on parallel subgradient projection techniques , 2002, IEEE Trans. Signal Process..

[40]  Isao Yamada,et al.  Efficient Adaptive Stereo Echo Canceling Schemes Based on Simultaneous Use of Multiple State Data , 2004 .

[41]  Sergios Theodoridis,et al.  Adaptive Learning in a World of Projections , 2011, IEEE Signal Processing Magazine.

[42]  Masahiro Yukawa,et al.  Projection-based dual averaging for stochastic sparse optimization , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Heinz H. Bauschke,et al.  Fixed-Point Algorithms for Inverse Problems in Science and Engineering , 2011, Springer Optimization and Its Applications.

[44]  Weifeng Liu,et al.  Kernel Adaptive Filtering , 2010 .

[45]  Wentao Ma,et al.  Robust kernel adaptive filters based on mean p-power error for noisy chaotic time series prediction , 2017, Eng. Appl. Artif. Intell..

[46]  Sergios Theodoridis,et al.  Online Kernel-Based Classification Using Adaptive Projection Algorithms , 2008, IEEE Transactions on Signal Processing.

[47]  Masahiro Yukawa,et al.  Multikernel Adaptive Filtering , 2012, IEEE Transactions on Signal Processing.

[48]  Klaus-Robert Müller,et al.  A better metric in kernel adaptive filtering , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[49]  Miguel Lázaro-Gredilla,et al.  Kernel Recursive Least-Squares Tracker for Time-Varying Regression , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[50]  Don R. Hush,et al.  An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels , 2006, IEEE Transactions on Information Theory.

[51]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[52]  Nanning Zheng,et al.  Generalized Correntropy for Robust Adaptive Filtering , 2015, IEEE Transactions on Signal Processing.

[53]  Florian Nadel,et al.  Stochastic Processes And Filtering Theory , 2016 .

[54]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[55]  Danilo Comminiello,et al.  Online Sequential Extreme Learning Machine With Kernels , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[56]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[57]  Masahiro Yukawa,et al.  An efficient kernel adaptive filtering algorithm using hyperplane projection along affine subspace , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[58]  John C. Platt A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.

[59]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[60]  Akira Tanaka,et al.  Theoretical analyses on a class of nested RKHS's , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[61]  Masahiro Yukawa,et al.  Online Model-Selection and Learning for Nonlinear Estimation Based on Multikernel Adaptive Filtering , 2017, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[62]  Masahiro Yukawa,et al.  Adaptive Learning in Cartesian Product of Reproducing Kernel Hilbert Spaces , 2014, IEEE Transactions on Signal Processing.

[63]  Klaus-Robert Müller,et al.  Incremental Support Vector Learning: Analysis, Implementation and Applications , 2006, J. Mach. Learn. Res..

[64]  A. Peterson,et al.  Transform domain LMS algorithm , 1983 .

[65]  Klaus-Robert Müller,et al.  Why Does a Hilbertian Metric Work Efficiently in Online Learning With Kernels? , 2016, IEEE Signal Processing Letters.

[66]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[67]  Ignacio Santamaria,et al.  A comparative study of kernel adaptive filtering algorithms , 2013, 2013 IEEE Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE).

[68]  Masahiro Yukawa,et al.  Online model selection and learning by multikernel adaptive filtering , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[69]  Badong Chen,et al.  Quantized Kernel Least Mean Square Algorithm , 2012, IEEE Transactions on Neural Networks and Learning Systems.