The Kernel Adaptive Autoregressive-Moving-Average Algorithm

In this paper, we present a novel kernel adaptive recurrent filtering algorithm based on the autoregressive-moving-average (ARMA) model, which is trained with recurrent stochastic gradient descent in the reproducing kernel Hilbert spaces. This kernelized recurrent system, the kernel adaptive ARMA (KAARMA) algorithm, brings together the theories of adaptive signal processing and recurrent neural networks (RNNs), extending the current theory of kernel adaptive filtering (KAF) using the representer theorem to include feedback. Compared with classical feedforward KAF methods, the KAARMA algorithm provides general nonlinear solutions for complex dynamical systems in a state-space representation, with a deferred teacher signal, by propagating forward the hidden states. We demonstrate its capabilities to provide exact solutions with compact structures by solving a set of benchmark nondeterministic polynomial-complete problems involving grammatical inference. Simulation results show that the KAARMA algorithm outperforms equivalent input-space recurrent architectures using first- and second-order RNNs, demonstrating its potential as an effective learning solution for the identification and synthesis of deterministic finite automata.

[1]  J. Shynk Adaptive IIR filtering , 1989, IEEE ASSP Magazine.

[2]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[3]  Weifeng Liu,et al.  Kernel Adaptive Filtering , 2010 .

[4]  José Carlos Príncipe,et al.  2011 Ieee International Workshop on Machine Learning for Signal Processing Stochastic Kernel Temporal Difference for Reinforcement Learning , 2022 .

[5]  Badong Chen,et al.  Online efficient learning with quantized KLMS and L1 regularization , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[6]  José Luis Rojo-Álvarez,et al.  Support Vector Machines for Nonlinear Kernel ARMA System Identification , 2006, IEEE Transactions on Neural Networks.

[7]  Benjamin Schrauwen,et al.  Recurrent Kernel Machines: Computing with Infinite Echo State Networks , 2012, Neural Computation.

[8]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[9]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[10]  Badong Chen,et al.  Quantized Kernel Recursive Least Squares Algorithm , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[12]  Badong Chen,et al.  Quantized Kernel Least Mean Square Algorithm , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Sepp Hochreiter,et al.  Guessing can Outperform Many Long Time Lag Algorithms , 1996 .

[14]  Weifeng Liu,et al.  An Information Theoretic Approach of Designing Sparse Kernel Adaptive Filters , 2009, IEEE Transactions on Neural Networks.

[15]  S. Haykin,et al.  Kernel Least‐Mean‐Square Algorithm , 2010 .

[16]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[17]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[18]  Michael A. Arbib,et al.  An Introduction to Formal Language Theory , 1988, Texts and Monographs in Computer Science.

[19]  Dana S. Scott,et al.  Finite Automata and Their Decision Problems , 1959, IBM J. Res. Dev..

[20]  John C. Platt A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.

[21]  José Luis Rojo-Álvarez,et al.  Explicit Recursive and Adaptive Filtering in Reproducing Kernel Hilbert Spaces , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Donald R. Smith Variational methods in optimization , 1974 .

[23]  José Carlos Príncipe,et al.  The gamma-filter-a new class of adaptive IIR filters with restricted feedback , 1993, IEEE Trans. Signal Process..

[24]  Andrej Dobnikar,et al.  On-line identification and reconstruction of finite automata with generalized recurrent neural networks , 2003, Neural Networks.

[25]  C. Lee Giles,et al.  Experimental Comparison of the Effect of Order in Recurrent Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[26]  Weifeng Liu,et al.  Kernel Adaptive Filtering: A Comprehensive Introduction , 2010 .

[27]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[28]  Richard D. Braatz,et al.  On the "Identification and control of dynamical systems using neural networks" , 1997, IEEE Trans. Neural Networks.

[29]  Xiangping Zeng,et al.  Low-Complexity Nonlinear Adaptive Filter Based on a Pipelined Bilinear Recurrent Neural Network , 2011, IEEE Transactions on Neural Networks.

[30]  Paul Honeine,et al.  Online Prediction of Time Series Data With Kernels , 2009, IEEE Trans. Signal Process..

[31]  Yuichi Nakamura,et al.  Approximation of dynamical systems by continuous time recurrent neural networks , 1993, Neural Networks.

[32]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[33]  Badong Chen,et al.  Learning Nonlinear Generative Models of Time Series With a Kalman Filter in RKHS , 2014, IEEE Transactions on Signal Processing.

[34]  Iickho Song,et al.  Identification of Finite State Automata With a Class of Recurrent Neural Networks , 2010, IEEE Transactions on Neural Networks.

[35]  T. Kailath,et al.  A state-space approach to adaptive RLS filtering , 1994, IEEE Signal Processing Magazine.

[36]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[37]  Weifeng Liu,et al.  Extended Kernel Recursive Least Squares Algorithm , 2009, IEEE Transactions on Signal Processing.

[38]  Miguel Lázaro-Gredilla,et al.  Kernel Recursive Least-Squares Tracker for Time-Varying Regression , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[40]  Eilon Vaadia,et al.  Kernel-ARMA for Hand Tracking and Brain-Machine interfacing During 3D Motor Control , 2008, NIPS.

[41]  Padhraic Smyth,et al.  Learning Finite State Machines With Self-Clustering Recurrent Networks , 1993, Neural Computation.

[42]  Ursula Dresdner,et al.  Computation Finite And Infinite Machines , 2016 .

[43]  Badong Chen,et al.  A novel extended kernel recursive least squares algorithm , 2012, Neural Networks.

[44]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[45]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[46]  Michael J. Watts,et al.  IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Publication Information , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Jerome A. Feldman,et al.  Learning automata from ordered examples , 1991, COLT '88.

[48]  L. Ralaivola,et al.  Time series filtering, smoothing and learning using the kernel Kalman filter , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[49]  Ryohei Nakano,et al.  Stable behavior in a recurrent neural network for a finite state machine , 2000, Neural Networks.

[50]  Xiaohong Jiang,et al.  Generalized Two-Hop Relay for Flexible Delay Control in MANETs , 2012, IEEE/ACM Transactions on Networking.

[51]  Ronald J. Williams,et al.  Training recurrent networks using the extended Kalman filter , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[52]  Badong Chen,et al.  Universal Approximation with Convex Optimization: Gimmick or Reality? [Discussion Forum] , 2015, IEEE Computational Intelligence Magazine.

[53]  John F. Kolen,et al.  Field Guide to Dynamical Recurrent Networks , 2001 .

[54]  C. L. Giles,et al.  Second-order recurrent neural networks for grammatical inference , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[55]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[56]  Herbert Jaeger,et al.  The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[57]  P J Webros BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[58]  James R. Zeidler,et al.  Adaptive tracking of linear time-variant systems by extended RLS algorithms , 1997, IEEE Trans. Signal Process..

[59]  José Carlos Príncipe,et al.  Kernel recurrent system trained by real-time recurrent learning algorithm , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[60]  Badong Chen,et al.  A FIXED-BUDGET QUANTIZED KERNEL LEAST MEAN SQUARE ALGORITHM , 2012 .

[61]  B. Anderson,et al.  Optimal Filtering , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[62]  Austin J. Brockmeier,et al.  A Tensor-Product-Kernel Framework for Multiscale Neural Activity Decoding and Control , 2014, Comput. Intell. Neurosci..

[63]  Marvin Minsky,et al.  Computation : finite and infinite machines , 2016 .

[64]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[65]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[66]  Ignacio Santamaría,et al.  Nonlinear System Identification using a New Sliding-Window Kernel RLS Algorithm , 2007, J. Commun..

[67]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[68]  Garrison W. Cottrell,et al.  2007 Special Issue: Learning grammatical structure with Echo State Networks , 2007 .

[69]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..