Sparse Kernel-SARSA(λ) with an Eligibility Trace

We introduce the first online kernelized version of SARSA(λ) to permit sparsification for arbitrary λ for 0 ≤ λ ≤ 1; this is possible via a novel kernelization of the eligibility trace that is maintained separately from the kernelized value function. This separation is crucial for preserving the functional structure of the eligibility trace when using sparse kernel projection techniques that are essential for memory efficiency and capacity control. The result is a simple and practical Kernel-SARSA(λ) algorithm for general 0 ≤ λ ≤ 1 that is memory-efficient in comparison to standard SARSA(λ) (using various basis functions) on a range of domains including a real robotics task running on a Willow Garage PR2 robot.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[3]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[4]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[5]  Shie Mannor,et al.  Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.

[6]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[7]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[8]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[9]  Peter Stone,et al.  Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-like Exploration , 2010, ECML/PKDD.

[10]  Barbara Caputo,et al.  The projectron: a bounded kernel-based Perceptron , 2008, ICML '08.

[11]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[12]  Yew-Soon Ong,et al.  Advances in Natural Computation, First International Conference, ICNC 2005, Changsha, China, August 27-29, 2005, Proceedings, Part I , 2005, ICNC.

[13]  Gavin Taylor,et al.  Kernelized value function approximation for reinforcement learning , 2009, ICML '09.

[14]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[15]  Liming Xiang,et al.  Kernel-Based Reinforcement Learning , 2006, ICIC.

[16]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[17]  Nicholas K. Jong,et al.  Kernel-Based Models for Reinforcement Learning , 2006 .

[18]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[19]  Tapio Elomaa,et al.  Machine Learning: ECML 2002 , 2002, Lecture Notes in Computer Science.

[20]  Xin Xu,et al.  Kernel Least-Squares Temporal Difference Learning , 2006 .

[21]  Xin Xu,et al.  A Sparse Kernel-Based Least-Squares Temporal Difference Algorithm for Reinforcement Learning , 2006, ICNC.

[22]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[23]  Xin Xu,et al.  Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[24]  Shie Mannor,et al.  Sparse Online Greedy Support Vector Regression , 2002, ECML.

[25]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[26]  Yaakov Engel,et al.  Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .

[27]  Manfred Opper,et al.  Sparse Representation for Gaussian Process Models , 2000, NIPS.