Kernel Rewards Regression: An Information Efficient Batch Policy Iteration Approach

We present the novel Kernel Rewards Regression (KRR) method for Policy Iteration in Reinforcement Learning on continuous state domains. Our method is able to obtain very useful policies observing just a few state action transitions. It considers the Reinforcement Learning problem as a regression task for which any appropriate technique may be applied. The use of kernel methods, e.g. the Support Vector Machine, enables the user to incorporate different types of structural prior knowledge about the state space by redefining the inner product. Furthermore KRR is a completely Off-policy method. The observations may be constructed by any sufficiently exploring policy, even the fully random one. We tested the algorithm on three typical Reinforcement Learning benchmarks. Moreover we give a proof for the correctness of our model and an error bound for estimating the Q-functions.

[1]  Bernard Haasdonk,et al.  Tangent distance kernels for support vector machines , 2002, Object recognition supported by user interaction for service robots.

[3]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[4]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[5]  Richard S. Sutton,et al.  Open Theoretical Questions in Reinforcement Learning , 1999, EuroCOLT.

[6]  Michail G. Lagoudakis,et al.  Model-Free Least-Squares Policy Iteration , 2001, NIPS.

[7]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[8]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[9]  Alexander J. Smola,et al.  Learning with non-positive kernels , 2004, ICML.

[10]  Volker Tresp,et al.  Mixtures of Gaussian Processes , 2000, NIPS.

[11]  Thomas Martinetz MaxMinOver: a simple incremental learning procedure for support vector classification , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[12]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[13]  Volker Tresp,et al.  Scaling Kernel-Based Systems to Large Data Sets , 2001, Data Mining and Knowledge Discovery.

[14]  Bernhard Schölkopf,et al.  Incorporating Invariances in Non-Linear Support Vector Machines , 2001, NIPS.

[15]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[16]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Xin Wang,et al.  Batch Value Function Approximation via Support Vectors , 2001, NIPS.

[19]  O. Bousquet,et al.  Kernels, Associated Structures and Generalizations , 2004 .

[20]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[21]  Michail G. Lagoudakis,et al.  Least-Squares Methods in Reinforcement Learning for Control , 2002, SETN.

[22]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[23]  David R. Musicant,et al.  Data Discrimination via Nonlinear Generalized Support Vector Machines , 2001 .

[24]  Trevor Hastie,et al.  The Elements of Statistical Learning Theory , 2001 .

[25]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[26]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[27]  Dale Schuurmans,et al.  Support Vector Machines on General Confidence Functions , 2005 .

[28]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.