Towards Hierarchical Task Decomposition using Deep Reinforcement Learning for Pick and Place Subtasks

Deep Reinforcement Learning (DRL) is emerging as a promising approach to generate adaptive behaviors for robotic platforms. However, a major drawback of using DRL is the data-hungry training regime that requires millions of trial and error attempts, which is impractical when running experiments on robotic systems. Learning from Demonstrations (LfD) has been introduced to solve this issue by cloning the behavior of expert demonstrations. However, LfD requires a large number of demonstrations that are difficult to be acquired since dedicated complex setups are required. To overcome these limitations, we propose a multi-subtask reinforcement learning methodology where complex pick and place tasks can be decomposed into low-level subtasks. These subtasks are parametrized as expert networks and learned via DRL methods. Trained subtasks are then combined by a high-level choreographer to accomplish the intended pick and place task considering different initial configurations. As a testbed, we use a pick and place robotic simulator to demonstrate our methodology and show that our method outperforms a benchmark methodology based on LfD in terms of sample-efficiency. We transfer the learned policy to the real robotic system and demonstrate robust grasping using various geometric-shaped objects.

[1]  Joelle Pineau,et al.  A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.

[2]  Ken Goldberg,et al.  Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[3]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[4]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[5]  Qiuguo Zhu,et al.  Multi-expert learning of adaptive legged locomotion , 2020, Science Robotics.

[6]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[7]  Masayoshi Tomizuka,et al.  Toward Modularization of Neural Network Autonomous Driving Policy Using Parallel Attribute Networks , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[8]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Kate Saenko,et al.  Hierarchical Actor-Critic , 2017, ArXiv.

[10]  Gerardo Aragon-Camarasa,et al.  On Simple Reactive Neural Networks for Behaviour-Based Reinforcement Learning , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[12]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[13]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[14]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Vinicius G. Goecks,et al.  Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Dense and Sparse Reward Environments , 2020, AAMAS.

[16]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[17]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[18]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[19]  A. Aldo Faisal,et al.  Dot-to-Dot: Explainable Hierarchical Reinforcement Learning for Robotic Manipulation , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Alexei A. Efros,et al.  Investigating Human Priors for Playing Video Games , 2018, ICML.

[21]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Paolo Fiorini,et al.  Learning from Demonstrations for Autonomous Soft-tissue Retraction * , 2021, 2021 International Symposium on Medical Robotics (ISMR).

[23]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[24]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[25]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[26]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[27]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[28]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[29]  Yujing Hu,et al.  Mastering Basketball With Deep Reinforcement Learning: An Integrated Curriculum Training Approach , 2020, AAMAS.

[30]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[31]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..