Adaptive autonomous control using online value iteration with gaussian processes

In this paper, we present a novel approach to controlling a robotic system online from scratch based on the reinforcement learning principle. In contrast to other approaches, our method learns the system dynamics and the value function separately, which permits to identify the individual characteristics and is, therefore, easily adaptable to changing conditions. The major problem in the context of learning control policies lies in high-dimensional state and action spaces, that needs to be explored in order to identify the optimal policy. In this paper, we propose an approach that learns the system dynamics and the value function in an alternating fashion based on Gaussian process models. Additionally, to reduce computation time and to make the system applicable to online learning, we present an efficient sparsification method. In experiments carried out with a real miniature blimp we demonstrate that our approach can learn height control online. Further results obtained with an inverted pendulum show that our method requires less data to achieve the same performance as an off-line learning approach.

[1]  Leslie Pack Kaelbling,et al.  Recent Advances in Reinforcement Learning , 1996, Springer US.

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  Rémi Coulom,et al.  Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .

[4]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[5]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[6]  Shie Mannor,et al.  Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.

[7]  K. Obermayer,et al.  Multiple-step ahead prediction for non linear dynamic systems: A Gaussian Process treatment with propagation of the uncertainty , 2003, NIPS 2003.

[8]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[9]  Peter Szabó,et al.  Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods , 2005, NIPS.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[12]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[13]  Nicholas K. Jong,et al.  Kernel-Based Models for Reinforcement Learning , 2006 .

[14]  W. Burgard,et al.  Autonomous blimp control using model-free reinforcement learning in a continuous state and action space , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Wolfram Burgard,et al.  Towards an Experimental Autonomous Blimp Platform , 2007, EMCR.

[16]  Carl E. Rasmussen,et al.  Probabilistic Inference for Fast Learning in Control , 2008, EWRL.

[17]  J. Peters,et al.  Approximate dynamic programming with Gaussian processes , 2008, 2008 American Control Conference.

[18]  Carl E. Rasmussen,et al.  Model-Based Reinforcement Learning with Continuous States and Actions , 2008, ESANN.

[19]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.