论文信息 - Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning

Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning

Over the last years, there has been substantial progress in robust manipulation in unstructured environments. The long-term goal of our work is to get away from precise, but very expensive robotic systems and to develop affordable, potentially imprecise, self-adaptive manipulator systems that can interactively perform tasks such as playing with children. In this paper, we demonstrate how a low-cost off-the-shelf robotic system can learn closed-loop policies for a stacking task in only a handful of trials-from scratch. Our manipulator is inaccurate and provides no pose feedback. For learning a controller in the work space of a Kinect-style depth camera, we use a model-based reinforcement learning technique. Our learning method is data efficient, reduces model bias, and deals with several noise sources in a principled way during long-term planning. We present a way of incorporating state-space constraints into the learning process and analyze the learning gain by exploiting the sequential structure of the stacking task.

[1] Jeff G. Schneider,et al. Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning , 1996, NIPS.

[2] Jos Stam,et al. Stable fluids , 1999, SIGGRAPH.

[3] Jesse Hoey,et al. SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[4] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[5] Jeff G. Schneider,et al. Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[6] Agathe Girard,et al. Propagation of uncertainty in Bayesian kernel models - application to multiple-step ahead forecasting , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[8] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[9] Jessica K. Hodgins,et al. Estimating cloth simulation parameters from video , 2003, SCA '03.

[10] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[11] Pieter Abbeel,et al. Using inaccurate models in reinforcement learning , 2006, ICML.

[12] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13] Dieter Fox,et al. Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[14] Andre Cohen,et al. An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[15] Jun Nakanishi,et al. Operational Space Control: A Theoretical and Empirical Comparison , 2008, Int. J. Robotics Res..

[16] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[17] Uwe D. Hanebeck,et al. Analytic moment-based Gaussian process filtering , 2009, ICML '09.

[18] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[19] Carl E. Rasmussen,et al. Gaussian process dynamic programming , 2009, Neurocomputing.

[20] Jan Peters,et al. Model Learning with Local Gaussian Process Regression , 2009, Adv. Robotics.

[21] Dieter Fox,et al. Learning GP-BayesFilters via Gaussian process latent variable models , 2009, Auton. Robots.

[22] Morgan Quigley,et al. ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[23] Wolfram Burgard,et al. Body schema learning for robotic manipulators from visual self-perception , 2009, Journal of Physiology - Paris.

[24] Mike Stilman,et al. Combining motion planning and optimization for flexible robot manipulation , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[25] Quoc V. Le,et al. Low-cost accelerometers for robotic manipulator perception , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26] Marc Peter Deisenroth,et al. Efficient reinforcement learning using Gaussian processes , 2010 .

[27] Pieter Abbeel,et al. Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding , 2010, 2010 IEEE International Conference on Robotics and Automation.

[28] Byron Boots,et al. Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[29] Olivier Sigaud,et al. From Motor Learning to Interaction Learning in Robots , 2010, From Motor Learning to Interaction Learning in Robots.

[30] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[31] Jan Peters,et al. Model learning for robot control: a survey , 2011, Cognitive Processing.

[32] Maxim Likhachev,et al. Cart pushing with a mobile manipulation system: Towards navigation with moveable objects , 2011, 2011 IEEE International Conference on Robotics and Automation.

[33] Nicholas Roy,et al. Learning to Control a Low-Cost Manipulator Using Data-Efficient Reinforcement Learning , 2012 .

[34] Mike Stilman,et al. Hierarchical Decision Theoretic Planning for Navigation Among Movable Obstacles , 2012, WAFR.

[35] Mike Stilman,et al. Planning with movable obstacles in continuous environments with uncertain dynamics , 2013, 2013 IEEE International Conference on Robotics and Automation.

[36] Jonathan Scholz,et al. What Does Physics Bias : A Comparison of Model Priors for Robot Manipulation , 2013 .

[37] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.