Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning

Over the last years, there has been substantial progress in robust manipulation in unstructured environments. The long-term goal of our work is to get away from precise, but very expensive robotic systems and to develop affordable, potentially imprecise, self-adaptive manipulator systems that can interactively perform tasks such as playing with children. In this paper, we demonstrate how a low-cost off-the-shelf robotic system can learn closed-loop policies for a stacking task in only a handful of trials-from scratch. Our manipulator is inaccurate and provides no pose feedback. For learning a controller in the work space of a Kinect-style depth camera, we use a model-based reinforcement learning technique. Our learning method is data efficient, reduces model bias, and deals with several noise sources in a principled way during long-term planning. We present a way of incorporating state-space constraints into the learning process and analyze the learning gain by exploiting the sequential structure of the stacking task.

[1]  Jeff G. Schneider,et al.  Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning , 1996, NIPS.

[2]  Jos Stam,et al.  Stable fluids , 1999, SIGGRAPH.

[3]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[4]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[5]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[6]  Agathe Girard,et al.  Propagation of uncertainty in Bayesian kernel models - application to multiple-step ahead forecasting , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[8]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[9]  Jessica K. Hodgins,et al.  Estimating cloth simulation parameters from video , 2003, SCA '03.

[10]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[11]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[12]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Dieter Fox,et al.  Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[14]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[15]  Jun Nakanishi,et al.  Operational Space Control: A Theoretical and Empirical Comparison , 2008, Int. J. Robotics Res..

[16]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[17]  Uwe D. Hanebeck,et al.  Analytic moment-based Gaussian process filtering , 2009, ICML '09.

[18]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[19]  Carl E. Rasmussen,et al.  Gaussian process dynamic programming , 2009, Neurocomputing.

[20]  Jan Peters,et al.  Model Learning with Local Gaussian Process Regression , 2009, Adv. Robotics.

[21]  Dieter Fox,et al.  Learning GP-BayesFilters via Gaussian process latent variable models , 2009, Auton. Robots.

[22]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[23]  Wolfram Burgard,et al.  Body schema learning for robotic manipulators from visual self-perception , 2009, Journal of Physiology - Paris.

[24]  Mike Stilman,et al.  Combining motion planning and optimization for flexible robot manipulation , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[25]  Quoc V. Le,et al.  Low-cost accelerometers for robotic manipulator perception , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Marc Peter Deisenroth,et al.  Efficient reinforcement learning using Gaussian processes , 2010 .

[27]  Pieter Abbeel,et al.  Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding , 2010, 2010 IEEE International Conference on Robotics and Automation.

[28]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[29]  Olivier Sigaud,et al.  From Motor Learning to Interaction Learning in Robots , 2010, From Motor Learning to Interaction Learning in Robots.

[30]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[31]  Jan Peters,et al.  Model learning for robot control: a survey , 2011, Cognitive Processing.

[32]  Maxim Likhachev,et al.  Cart pushing with a mobile manipulation system: Towards navigation with moveable objects , 2011, 2011 IEEE International Conference on Robotics and Automation.

[33]  Nicholas Roy,et al.  Learning to Control a Low-Cost Manipulator Using Data-Efficient Reinforcement Learning , 2012 .

[34]  Mike Stilman,et al.  Hierarchical Decision Theoretic Planning for Navigation Among Movable Obstacles , 2012, WAFR.

[35]  Mike Stilman,et al.  Planning with movable obstacles in continuous environments with uncertain dynamics , 2013, 2013 IEEE International Conference on Robotics and Automation.

[36]  Jonathan Scholz,et al.  What Does Physics Bias : A Comparison of Model Priors for Robot Manipulation , 2013 .

[37]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.