LASER: Learning a Latent Action Space for Efficient Reinforcement Learning

The process of learning a manipulation task depends strongly on the action space used for exploration: posed in the incorrect action space, solving a task with reinforcement learning can be drastically inefficient. Additionally, similar tasks or instances of the same task family impose latent manifold constraints on the most effective action space: the task family can be best solved with actions in a manifold of the entire action space of the robot. Combining these insights we present LASER, a method to learn latent action spaces for efficient reinforcement learning. LASER factorizes the learning problem into two sub-problems, namely action space learning and policy learning in the new action space. It leverages data from similar manipulation task instances, either from an offline expert or online during policy learning, and learns from these trajectories a mapping from the original to a latent action space. LASER is trained as a variational encoder-decoder model to map raw actions into a disentangled latent action space while maintaining action reconstruction and latent space dynamic consistency. We evaluate LASER on two contact-rich robotic tasks in simulation, and analyze the benefit of policy learning in the generated latent action space. We show improved sample efficiency compared to the original action space from better alignment of the action space to the task space, as we observe with visualizations of the learned action space manifold. Additional details: pair.toronto.edu/laser

[1]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[2]  Abhinav Gupta,et al.  Dynamics-aware Embeddings , 2019, ICLR.

[3]  Oussama Khatib,et al.  A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..

[4]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[5]  Stefan Schaal,et al.  Learning variable impedance control , 2011, Int. J. Robotics Res..

[6]  Joris De Schutter,et al.  Specification of force-controlled actions in the "task frame formalism"-a synthesis , 1996, IEEE Trans. Robotics Autom..

[7]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[8]  Jianfeng Gao,et al.  Learning Compliance Adaptation in Contact-Rich Manipulation , 2020, ArXiv.

[9]  Ulrike Thomas,et al.  Compliant motion programming: The task frame formalism revisited , 2004 .

[10]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[11]  Scott Kuindersma,et al.  A Comparison of Action Spaces for Learning Manipulation Tasks , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[13]  Oussama Khatib,et al.  Adaptive human-inspired compliant contact primitives to perform surface–surface contact under uncertainty , 2016, Int. J. Robotics Res..

[14]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[15]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[16]  Ludovic Righetti,et al.  Learning Variable Impedance Control for Contact Sensitive Tasks , 2019, IEEE Robotics and Automation Letters.

[17]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[18]  Dorsa Sadigh,et al.  Controlling Assistive Robots with Learned Latent Actions , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Affan Pervez,et al.  Learning deep movement primitives using convolutional neural networks , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[20]  Matthew T. Mason,et al.  Compliance and Force Control for Computer Controlled Manipulators , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[21]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[22]  Scott Kuindersma,et al.  Autonomous Skill Acquisition on a Mobile Manipulator , 2011, AAAI.

[23]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[24]  Philip S. Thomas,et al.  Learning Action Representations for Reinforcement Learning , 2019, ICML.

[25]  Roberto Mart'in-Mart'in,et al.  robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , 2020, ArXiv.

[26]  Jane X. Wang,et al.  Reinforcement Learning, Fast and Slow , 2019, Trends in Cognitive Sciences.

[27]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[28]  Pierre Baldi,et al.  Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.

[29]  Frans A. Oliehoek,et al.  Plannable Approximations to MDP Homomorphisms: Equivariance under Actions , 2020, AAMAS.

[30]  Neville Hogan,et al.  Impedance Control: An Approach to Manipulation , 1984, 1984 American Control Conference.

[31]  Silvio Savarese,et al.  Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation , 2019, CoRL.

[32]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[33]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[34]  Silvio Savarese,et al.  Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35]  Gregory D. Hager,et al.  Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning , 2017, ISRR.