Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks

Reinforcement Learning (RL) of contact-rich manipulation tasks has yielded impressive results in recent years. While many studies in RL focus on varying the observation space or reward model, few efforts focused on the choice of action space (e.g. joint or end-effector space, position, velocity, etc.). However, studies in robot motion control indicate that choosing an action space that conforms to the characteristics of the task can simplify exploration and improve robustness to disturbances. This paper studies the effect of different action spaces in deep RL and advocates for variable impedance control in end-effector space (VICES) as an advantageous action space for constrained and contact-rich tasks. We evaluate multiple action spaces on three prototypical manipulation tasks: Path Following (task with no contact), Door Opening (task with kinematic constraints), and Surface Wiping (task with continuous contact). We show that VICES improves sample efficiency, maintains low energy consumption, and ensures safety across all three experimental setups. Further, RL policies learned with VICES can transfer across different robot models in simulation, and from simulation to real for the same robot. Further information is available at https://stanfordvl.github.io/vices.

[1]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[2]  Michiel van de Panne,et al.  Learning locomotion skills using DeepRL: does the choice of action space matter? , 2016, Symposium on Computer Animation.

[3]  Marc Toussaint,et al.  Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning , 2018, Robotics: Science and Systems.

[4]  Sami Haddadin,et al.  Force, Impedance, and Trajectory Learning for Contact Tooling and Haptic Identification , 2018, IEEE Transactions on Robotics.

[5]  Matthew T. Mason,et al.  Compliance and Force Control for Computer Controlled Manipulators , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[7]  Alexander Herzog,et al.  Learning a Structured Neural Network Policy for a Hopping Task , 2017, IEEE Robotics and Automation Letters.

[8]  Stefan Schaal,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[9]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[10]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[11]  Stefan Schaal,et al.  Real-Time Perception Meets Reactive Motion Generation , 2017, IEEE Robotics and Automation Letters.

[12]  Aude Billard,et al.  Learning Compliant Manipulation through Kinesthetic and Tactile Human-Robot Interaction , 2014, IEEE Transactions on Haptics.

[13]  Sergey Levine,et al.  Learning force-based manipulation of deformable objects from multiple demonstrations , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[15]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Pieter Abbeel,et al.  Learning by observation for surgical subtasks: Multilateral cutting of 3D viscoelastic and 2D Orthotropic Tissue Phantoms , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Daniel Leidner,et al.  Robotic Agents Representing, Reasoning, and Executing Wiping Tasks for Daily Household Chores , 2016, AAMAS.

[18]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[19]  Marc Toussaint,et al.  Learned graphical models for probabilistic planning provide a new class of movement primitives , 2013, Front. Comput. Neurosci..

[20]  Carme Torras,et al.  Learning Physical Collaborative Robot Behaviors From Human Demonstrations , 2016, IEEE Transactions on Robotics.

[21]  Aude Billard,et al.  Learning motions from demonstrations and rewards with time-invariant dynamical systems based policies , 2018, Auton. Robots.

[22]  Neville Hogan,et al.  Impedance Control: An Approach to Manipulation , 1984, 1984 American Control Conference.

[23]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[24]  Kenneth Y. Goldberg,et al.  Automating multi-throw multilateral surgical suturing with a mechanical needle guide and sequential convex optimization , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Aude Billard,et al.  A probabilistic Programming by Demonstration framework handling constraints in joint space and task space , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Silvio Savarese,et al.  ADAPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems , 2017, ISRR.

[27]  Gregory D. Hager,et al.  Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning , 2017, ISRR.

[28]  Darwin G. Caldwell,et al.  Force-based variable impedance learning for robotic manipulation , 2018, Robotics Auton. Syst..

[29]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[30]  Sethu Vijayakumar,et al.  Learning impedance control of antagonistic systems based on stochastic optimization principles , 2011, Int. J. Robotics Res..

[31]  Aude Billard,et al.  Learning object-level impedance control for robust grasping and dexterous manipulation , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[33]  Joris De Schutter,et al.  Specification of force-controlled actions in the "task frame formalism"-a synthesis , 1996, IEEE Trans. Robotics Autom..

[34]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[35]  Oussama Khatib,et al.  Adaptive human-inspired compliant contact primitives to perform surface–surface contact under uncertainty , 2016, Int. J. Robotics Res..

[36]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[37]  Ulrike Thomas,et al.  Compliant motion programming: The task frame formalism revisited , 2004 .

[38]  Peter Kulchyski and , 2015 .

[39]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[40]  Oussama Khatib,et al.  Inertial Properties in Robotic Manipulation: An Object-Level Framework , 1995, Int. J. Robotics Res..

[41]  Darwin G. Caldwell,et al.  Learning-based control strategy for safe human-robot interaction exploiting task and robot redundancies , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[42]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[43]  Brijen Thananjeyan,et al.  Multilateral surgical pattern cutting in 2D orthotropic gauze with deep reinforcement learning policies for tensioning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Christian Ott,et al.  Cartesian Impedance Control of Redundant and Flexible-Joint Robots , 2008, Springer Tracts in Advanced Robotics.

[45]  Stefan Schaal,et al.  Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46]  Stefan Schaal,et al.  Optimal distribution of contact forces with inverse-dynamics control , 2013, Int. J. Robotics Res..

[47]  Oliver Kroemer,et al.  Towards learning hierarchical skills for multi-phase manipulation tasks , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[49]  김병찬,et al.  Impedance learning for Robotic Contact Tasks using Natural Actor-Critic Algorithm , 2010 .

[50]  Oussama Khatib,et al.  A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..

[51]  Stefan Schaal,et al.  Learning variable impedance control , 2011, Int. J. Robotics Res..

[52]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.