2011 Ieee International Workshop on Machine Learning for Signal Processing Stochastic Kernel Temporal Difference for Reinforcement Learning

This paper introduces a kernel adaptive filter using the stochastic gradient on temporal differences, kernel TD(λ), to estimate the state-action value function Q in reinforcement learning. Kernel methods are powerful for solving nonlinear problems, but the growing computational complexity and memory size limit their applicability on practical scenarios. To overcome this, the quantization approach introduced in [1] is applied. To help understand the behavior and illustrate the role of the parameters, we apply the algorithm on a 2-dimentional spatial navigation task. Eligibility traces are commonly applied in TD learning to improve data efficiency, so the relations of eligibility trace λ and step size and filter size are observed. Moreover, kernel TD (0) is applied to neural decoding of an 8 target center-out reaching task performed by a monkey. Results show the method can effectively learn the brain-state action mapping for this task.

[1]  Yaakov Engel,et al.  Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .

[2]  Xin Xu,et al.  Kernel Least-Squares Temporal Difference Learning , 2006 .

[3]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[4]  André da Motta Salles Barreto,et al.  On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization , 2012, NIPS.

[5]  Nicholas K. Jong,et al.  Kernel-Based Models for Reinforcement Learning , 2006 .

[6]  Weifeng Liu,et al.  An Information Theoretic Approach of Designing Sparse Kernel Adaptive Filters , 2009, IEEE Transactions on Neural Networks.

[7]  S. Haykin,et al.  Kernel Least‐Mean‐Square Algorithm , 2010 .

[8]  José Carlos Príncipe,et al.  Coadaptive Brain–Machine Interface via Reinforcement Learning , 2009, IEEE Transactions on Biomedical Engineering.

[9]  Badong Chen,et al.  Quantized Kernel Least Mean Square Algorithm , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[11]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[12]  José Carlos Príncipe,et al.  Reinforcement learning via kernel temporal difference , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[13]  J. C. Sanchez,et al.  Control of a center-out reaching task using a reinforcement learning Brain-Machine Interface , 2011, 2011 5th International IEEE/EMBS Conference on Neural Engineering.

[14]  Liming Xiang,et al.  Kernel-Based Reinforcement Learning , 2006, ICIC.

[15]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[16]  Weifeng Liu,et al.  Kernel Adaptive Filtering , 2010 .

[17]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[18]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[19]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[20]  Xin Xu,et al.  Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[21]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.