Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models

A core challenge for an autonomous agent acting in the real world is to adapt its repertoire of skills to cope with its noisy perception and dynamics. To scale learning of skills to long-horizon tasks, robots should be able to learn and later refine their skills in a structured manner through trajectories rather than making instantaneous decisions individually at each time step. To this end, we propose the Soft ActorCritic Gaussian Mixture Model (SAC-GMM), a novel hybrid approach that learns robot skills through a dynamical system and adapts the learned skills in their own trajectory distribution space through interactions with the environment. Our approach combines classical robotics techniques of learning from demonstration with the deep reinforcement learning framework and exploits their complementary nature. We show that our method utilizes sensors solely available during the execution of preliminarily learned skills to extract relevant features that lead to faster skill refinement. Extensive evaluations in both simulation and real-world environments demonstrate the effectiveness of our method in refining robot skills by leveraging physical interactions, high-dimensional sensory data, and sparse task completion rewards. Videos, code, and pre-trained models are available at http://sac-gmm.cs.uni-freiburg.de.

[1]  Wolfram Burgard,et al.  Adaptive Curriculum Generation from Demonstrations for Sim-to-Real Visuomotor Control , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[2]  N. Higham COMPUTING A NEAREST SYMMETRIC POSITIVE SEMIDEFINITE MATRIX , 1988 .

[3]  Jeannette Bohg,et al.  Accurate Vision-based Manipulation through Contact Reasoning , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Stefan Schaal,et al.  Residual Learning from Demonstration: Adapting Dynamic Movement Primitives for Contact-rich Insertion Tasks , 2020 .

[5]  Mike Lambeta,et al.  TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors , 2020, ArXiv.

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Sergey Levine,et al.  Manipulation by Feel: Touch-Based Control with Deep Predictive Models , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[8]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks , 2019, IEEE Transactions on Robotics.

[9]  Shuran Song,et al.  Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Ian Taylor,et al.  Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Gerhard Neumann,et al.  Residual Feedback Learning for Contact-Rich Manipulation Tasks with Uncertainty , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[13]  Wolfram Burgard,et al.  Augmenting Action Model Learning by Non-Geometric Features , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[14]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[15]  Aude Billard,et al.  A Physically-Consistent Bayesian Non-Parametric Mixture Model for Dynamical System Learning , 2018, CoRL.

[16]  Byron Boots,et al.  Euclideanizing Flows: Diffeomorphic Reduction for Learning Stable Dynamical Systems , 2020, L4DC.

[17]  Alberto Rodriguez,et al.  TossingBot: Learning to Throw Arbitrary Objects With Residual Physics , 2019, IEEE Transactions on Robotics.

[18]  Jan Peters,et al.  Mixture of Attractors: A Novel Movement Primitive Representation for Learning Motor Skills From Demonstrations , 2018, IEEE Robotics and Automation Letters.

[19]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[21]  Darwin G. Caldwell,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[22]  Aude Billard,et al.  A Dynamical System Approach to Motion and Force Generation in Contact Tasks , 2019, Robotics: Science and Systems.

[23]  Wolfram Burgard,et al.  Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[24]  You Zhou,et al.  Movement Primitive Learning and Generalization: Using Mixture Density Networks , 2020, IEEE Robotics & Automation Magazine.

[25]  Torsten Kröger,et al.  Self-Supervised Learning for Precise Pick-and-Place Without Object Model , 2020, IEEE Robotics and Automation Letters.

[26]  Wolfram Burgard,et al.  Hindsight for Foresight: Unsupervised Structured Dynamics Models from Physical Interaction , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Abhinav Gupta,et al.  Neural Dynamic Policies for End-to-End Sensorimotor Learning , 2020, NeurIPS.

[28]  Andy Zeng,et al.  Learning to See before Learning to Act: Visual Pre-training for Manipulation , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[30]  Darwin G. Caldwell,et al.  Learning bimanual end-effector poses from demonstrations using task-parameterized dynamical systems , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Aude Billard,et al.  Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[32]  Sergey Levine,et al.  Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[33]  Shinya Kotosaka,et al.  Submitted to: IEEE International Conference on Humanoid Robotics Nonlinear Dynamical Systems as Movement Primitives , 2022 .

[34]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[35]  Jan Peters,et al.  ImitationFlow: Learning Deep Stable Stochastic Dynamic Systems by Normalizing Flows , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[37]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[39]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[40]  Sergey Levine,et al.  End-to-End Robotic Reinforcement Learning without Reward Engineering , 2019, Robotics: Science and Systems.

[41]  Mike Lambeta,et al.  DIGIT: A Novel Design for a Low-Cost Compact High-Resolution Tactile Sensor With Application to In-Hand Manipulation , 2020, IEEE Robotics and Automation Letters.

[42]  Wolfram Burgard,et al.  Coupling Mobile Base and End-Effector Motion in Task Space , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[43]  Pieter Abbeel,et al.  Learning for control from multiple demonstrations , 2008, ICML '08.

[44]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[45]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[46]  Dieter Fox,et al.  6-DOF GraspNet: Variational Grasp Generation for Object Manipulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).