A Gaussian Process Reinforcement Learning Algorithm with Adaptability and Minimal Tuning Requirements

We present a novel Bayesian reinforcement learning algorithm that addresses model bias and exploration overhead issues. The algorithm combines different aspects of several state-of-the-art reinforcement learning methods that use Gaussian Processes model-based approaches to increase the use of the online data samples. The algorithm uses a smooth reward function requiring the reward value to be derived from the environment state. It works with continuous states and actions in a coherent way with a minimized need for expert knowledge in parameter tuning. We analyse and discuss the practical benefits of the selected approach in comparison to more traditional methodological choices, and illustrate the use of the algorithm in a motor control problem involving a two-link simulated arm.

[1]  Konrad Paul Körding,et al.  The loss function of sensorimotor learning. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Lehel Csató,et al.  Improving Gaussian Process Value Function Approximation in Policy Gradient Algorithms , 2011, ICANN.

[3]  Marc Peter Deisenroth,et al.  Efficient Reinforcement Learning for Motor Control , 2009 .

[4]  Christopher K. I. Williams,et al.  Transformation Equivariant Boltzmann Machines , 2011, ICANN.

[5]  Reza Safabakhsh,et al.  Continuous state/action reinforcement learning: A growing self-organizing map approach , 2011, Neurocomputing.

[6]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[7]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[8]  Mohammad Ghavamzadeh,et al.  Bayesian actor-critic algorithms , 2007, ICML '07.

[9]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[10]  Peter Englert,et al.  Model-based imitation learning by probabilistic trajectory matching , 2013, 2013 IEEE International Conference on Robotics and Automation.

[11]  Masashi Sugiyama,et al.  Geodesic Gaussian kernels for value function approximation , 2008, Auton. Robots.

[12]  Angelo Arleo,et al.  Cognitive navigation based on nonuniform Gabor space sampling, unsupervised growing networks, and reinforcement learning , 2004, IEEE Transactions on Neural Networks.

[13]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[14]  Dorota Glowacka,et al.  Directing exploratory search: reinforcement learning from user interactions with keywords , 2013, IUI '13.

[15]  Dieter Fox,et al.  Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[16]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[17]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Jürgen Schmidhuber,et al.  Unsupervised Modeling of Partially Observable Environments , 2011, ECML/PKDD.

[20]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.