Efficient reinforcement learning for robots using informative simulated priors

Autonomous learning through interaction with the physical world is a promising approach to designing controllers and decision-making policies for robots. Unfortunately, learning on robots is often difficult due to the large number of samples needed for many learning algorithms. Simulators are one way to decrease the samples needed from the robot by incorporating prior knowledge of the dynamics into the learning algorithm. In this paper we present a novel method for transferring data from a simulator to a robot, using simulated data as a prior for real-world learning. A Bayesian nonparametric prior is learned from a potentially black-box simulator. The mean of this function is used as a prior for the Probabilistic Inference for Learning Control (PILCO) algorithm. The simulated prior improves the convergence rate and performance of PILCO by directing the policy search in areas of the state-space that have not yet been observed by the robot. Simulated and hardware results show the benefits of using the prior knowledge in the learning framework.

[1]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[2]  Jan Peters,et al.  Using model knowledge for learning inverse dynamics , 2010, 2010 IEEE International Conference on Robotics and Automation.

[3]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[4]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[5]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[8]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[9]  Carl E. Rasmussen,et al.  Policy search for learning robot control using sparse data , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Marc Peter Deisenroth,et al.  Efficient reinforcement learning using Gaussian processes , 2010 .

[11]  C. Rasmussen,et al.  Gaussian Process Priors with Uncertain Inputs - Application to Multiple-Step Ahead Time Series Forecasting , 2002, NIPS.

[12]  Yoonsuck Choe,et al.  Directed Exploration in Reinforcement Learning with Transferred Knowledge , 2012, EWRL.

[13]  Andrew Y. Ng,et al.  Policy search via the signed derivative , 2009, Robotics: Science and Systems.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[16]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[17]  Jonathan P. How,et al.  Reinforcement learning with multi-fidelity simulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Gregory Dudek,et al.  Learning legged swimming gaits from experience , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[20]  Dieter Fox,et al.  Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[21]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[22]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[23]  Csaba Szepesvári,et al.  Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.

[24]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.