Reinforcement Learning With Sequences of Motion Primitives for Robust Manipulation

Physical contact events often allow a natural decomposition of manipulation tasks into action phases and subgoals. Within the motion primitive paradigm, each action phase corresponds to a motion primitive, and the subgoals correspond to the goal parameters of these primitives. Current state-of-the-art reinforcement learning algorithms are able to efficiently and robustly optimize the parameters of motion primitives in very high-dimensional problems. These algorithms often consider only shape parameters, which determine the trajectory between the start- and end-point of the movement. In manipulation, however, it is also crucial to optimize the goal parameters, which represent the subgoals between the motion primitives. We therefore extend the policy improvement with path integrals (PI2) algorithm to simultaneously optimize shape and goal parameters. Applying simultaneous shape and goal learning to sequences of motion primitives leads to the novel algorithm PI2 Seq. We use our methods to address a fundamental challenge in manipulation: improving the robustness of everyday pick-and-place tasks.

[1]  Ian Horswill,et al.  Specialization of perceptual processes , 1993 .

[2]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[3]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[4]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[5]  Antonio Morales,et al.  Using Experience for Assessing Grasp Reliability , 2004, Int. J. Humanoid Robotics.

[6]  R. Cohen,et al.  Where grasps are made reveals how grasps are planned: generation and recall of motor plans , 2004, Experimental Brain Research.

[7]  Lih-Yuan Deng,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[8]  Miles C. Bowman,et al.  Control strategies in object manipulation tasks , 2006, Current Opinion in Neurobiology.

[9]  Sonja Stork,et al.  Subsequent actions influence motor control parameters of a current grasping action , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[10]  Paul R. Schrater,et al.  Grasping Objects with Environmentally Induced Position Uncertainty , 2009, PLoS Comput. Biol..

[11]  Stefan Schaal,et al.  Biologically-inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance , 2009, 2009 IEEE International Conference on Robotics and Automation.

[12]  Oliver Kroemer,et al.  Towards Motor Skill Learning for Robotics , 2007, ISRR.

[13]  Tom Schaul,et al.  Stochastic search using the natural gradient , 2009, ICML '09.

[14]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[15]  Siddhartha S. Srinivasa,et al.  Addressing pose uncertainty in manipulation planning using Task Space Regions , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Matei T. Ciocarlie,et al.  Contact-reactive grasping of objects with partial shape information , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Oliver Kroemer,et al.  Combining active learning and reactive control for robot grasping , 2010, Robotics Auton. Syst..

[18]  Jan Peters,et al.  Simulating Human Table Tennis with a Biomimetic Robot Setup , 2010, SAB.

[19]  Leslie Pack Kaelbling,et al.  Task-Driven Tactile Exploration , 2010, Robotics: Science and Systems.

[20]  Stefan Schaal,et al.  Reinforcement learning of full-body humanoid motor skills , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[21]  Jun Morimoto,et al.  Task-Specific Generalization of Discrete and Periodic Dynamic Movement Primitives , 2010, IEEE Transactions on Robotics.

[22]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[23]  Stefan Schaal,et al.  Learning motion primitive goals for robust manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Evangelos A. Theodorou,et al.  Iterative path integral stochastic optimal control: Theory and applications to motor control , 2011 .

[25]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[26]  Stefan Schaal,et al.  Online movement adaptation based on previous sensor experiences , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Ales Ude,et al.  Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives , 2011, Robotics Auton. Syst..

[28]  Stefan Schaal,et al.  Learning to grasp under uncertainty , 2011, 2011 IEEE International Conference on Robotics and Automation.

[29]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.