From Supervised to Reinforcement Learning: a Kernel-based Bayesian Filtering Framework

In a large number of applications, engineers have to estimate a function linked to the state of a dynamic system. To do so, a sequence of samples drawn from this unknown function is observed while the system is transiting from state to state and the problem is to generalize these observations to unvisited states. Several solutions can be envisioned among which regressing a family of parameterized functions so as to make it fit at best to the observed samples. This is the first problem addressed with the proposed kernel-based Bayesian filtering approach, which also allows quantifying uncertainty reduction occurring when acquiring more samples. Classical methods cannot handle the case where actual samples are not directly observable but only a non linear mapping of them is available, which happens when a special sensor has to be used or when solving the Bellman equation in order to control the system. However the approach proposed in this paper can be extended to this tricky case. Moreover, an application of this indirect function approximation scheme to reinforcement learning is presented. A set of experiments is also proposed in order to demonstrate the efficiency of this kernel-based Bayesian approach. Index Terms—supervised learning; reinforcement learning; Bayesian filtering; kernel methods

[1]  Christopher M. Bishop,et al.  Bayesian Regression and Classification , 2003 .

[2]  O. Pietquin,et al.  Online Bayesian kernel regression from nonlinear mapping of observations , 2008, IEEE Workshop on Machine Learning for Signal Processing.

[3]  Rudolph van der Merwe,et al.  Sigma-point kalman filters for probabilistic inference in dynamic state-space models , 2004 .

[4]  Thomas Martinetz,et al.  Kernel Rewards Regression: An Information Efficient Batch Policy Iteration Approach , 2006, Artificial Intelligence and Applications.

[5]  Matthieu Geist,et al.  Bayesian Reward Filtering , 2008, EWRL.

[6]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[7]  Miguel Á. Carreira-Perpiñán,et al.  Mode-Finding for Mixtures of Gaussian Distributions , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Robert Fitch,et al.  Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation , 2007, ICML '07.

[9]  Simon J. Godsill,et al.  Sequential Bayesian Kernel Regression , 2003, NIPS.

[10]  Jeffrey K. Uhlmann,et al.  Unscented filtering and nonlinear estimation , 2004, Proceedings of the IEEE.

[11]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[12]  Lihong Li,et al.  Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.

[13]  Yaakov Engel,et al.  Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .

[14]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[15]  O. Pietquin,et al.  A Sparse Nonlinear Bayesian Online Kernel Regression , 2008, 2008 The Second International Conference on Advanced Engineering Computing and Applications in Sciences.

[16]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[17]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Dan Simon,et al.  Optimal State Estimation: Kalman, H∞, and Nonlinear Approaches , 2006 .

[20]  D. Simon Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches , 2006 .

[21]  András Lörincz,et al.  Erratum , 2007, Neural Computation.

[22]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[23]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[24]  G. V. Puskorius,et al.  A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification , 1998, Proc. IEEE.