论文信息 - Learning a Skill-sequence-dependent Policy for Long-horizon Manipulation Tasks

Learning a Skill-sequence-dependent Policy for Long-horizon Manipulation Tasks

In recent years, the robotics community has made substantial progress in robotic manipulation using deep reinforcement learning (RL). Effectively learning of long-horizon tasks remains a challenging topic. Typical RL-based methods approximate long-horizon tasks as Markov decision processes and only consider current observation (images or other sensor information) as input state. However, such approximation ignores the fact that skill-sequence also plays a crucial role in long-horizon tasks. In this paper, we take both the observation and skill sequences into account and propose a skill-sequence-dependent hierarchical policy for solving a typical long-horizon task. The proposed policy consists of a high-level skill policy (utilizing skill sequences) and a low-level parameter policy (responding to observation) with corresponding training methods, which makes the learning much more sample-efficient. Experiments in simulation demonstrate that our approach successfully solves a long-horizon task and is significantly faster than Proximal Policy Optimization (PPO) and the task schema methods.

[1] Jeannette Bohg,et al. Learning Hierarchical Control for Robust In-Hand Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[2] Sergey Levine,et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[3] Silvio Savarese,et al. ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation , 2020, ArXiv.

[4] Peter Corke,et al. Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach , 2018, Robotics: Science and Systems.

[5] Danica Kragic,et al. Learning and Evaluation of the Approach Vector for Automatic Grasp Generation and Planning , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[6] Tamim Asfour,et al. Integrated Grasp Planning and Visual Object Localization For a Humanoid Robot with Five-Fingered Hands , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[8] Song-Chun Zhu,et al. Robot learning with a spatial, temporal, and causal and-or graph , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[9] Minoru Asada,et al. Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[10] Ken Goldberg,et al. Learning ambidextrous robot grasping policies , 2019, Science Robotics.

[11] Sanjay Krishnan,et al. HIRL: Hierarchical Inverse Reinforcement Learning for Long-Horizon Tasks with Delayed Rewards , 2016, ArXiv.

[12] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[13] Chelsea Finn,et al. Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation , 2019, ICLR.

[14] Pieter Abbeel,et al. Learning Predictive Representations for Deformable Objects Using Contrastive Estimation , 2020, CoRL.

[15] Abhinav Gupta,et al. Efficient Bimanual Manipulation Using Learned Task Schemas , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[16] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[17] Silvio Savarese,et al. HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators , 2019, CoRL.

[18] Roberto Mart'in-Mart'in,et al. robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , 2020, ArXiv.

[19] Silvio Savarese,et al. Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations , 2020, Robotics: Science and Systems.

[20] Karol Hausman,et al. Modeling Long-horizon Tasks as Sequential Interaction Landscapes , 2020, CoRL.

[21] Atil Iscen,et al. Hierarchical Reinforcement Learning for Quadruped Locomotion , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22] Sergey Levine,et al. Optimal control with learned local models: Application to dexterous manipulation , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[23] Xinyu Liu,et al. Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[24] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[25] Ludovic Righetti,et al. Learning Variable Impedance Control for Contact Sensitive Tasks , 2019, IEEE Robotics and Automation Letters.

[26] Sergey Levine,et al. Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning , 2019, CoRL.

[27] Song-Chun Zhu,et al. A tale of two explanations: Enhancing human trust by explaining robot behavior , 2019, Science Robotics.

[28] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[29] Joseph Redmon,et al. Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[30] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[31] Marcos R. O. A. Maximo,et al. Learning Humanoid Robot Running Skills through Proximal Policy Optimization , 2019, 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE).

[32] Hussein A. Abbass,et al. Hierarchical Deep Reinforcement Learning for Continuous Action Control , 2018, IEEE Transactions on Neural Networks and Learning Systems.