Hierarchical Reinforcement Learning using Gaussian Random Trajectory Generation in Autonomous Furniture Assembly

In this paper, we propose a Gaussian Random Trajectory guided Hierarchical Reinforcement Learning (GRT-HL) method for autonomous furniture assembly. The furniture assembly problem is formulated as a comprehensive human-like long-horizon manipulation task that requires a long-term planning and a sophisticated control. Our proposed model, GRT-HL, draws inspirations from the semi-supervised adversarial autoencoders, and learns latent representations of the position trajectories of the end-effector. The high-level policy generates an optimal trajectory for furniture assembly, considering the structural limitations of the robotic agents. Given the trajectory drawn from the high-level policy, the low-level policy makes a plan and controls the end-effector. We first evaluate the performance of GRT-HL compared to the state-of-the-art reinforcement learning methods in furniture assembly tasks. We demonstrate that GRT-HL successfully solves the long-horizon problem with extremely sparse rewards by generating the trajectory for planning.

[1]  E. Papadopoulos,et al.  Slope Handling for Quadruped Robots Using Deep Reinforcement Learning and Toe Trajectory Planning* , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Jennie Si,et al.  Online Reinforcement Learning Control for the Personalization of a Robotic Knee Prosthesis , 2020, IEEE Transactions on Cybernetics.

[3]  Dennis Kundrat,et al.  Collaborative Robot-Assisted Endovascular Catheterization with Generative Adversarial Imitation Learning , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Wei Zhang,et al.  Grasp for Stacking via Deep Reinforcement Learning , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Hang Su,et al.  Reinforcement Learning Based Manipulation Skill Transferring for Robot-assisted Minimally Invasive Surgery , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Allan Jabri,et al.  Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Joseph J. Lim,et al.  IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks , 2019, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Silvio Savarese,et al.  HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators , 2019, CoRL.

[9]  Wei Zhang,et al.  Coarse-to-Fine UAV Target Tracking With Deep Reinforcement Learning , 2019, IEEE Transactions on Automation Science and Engineering.

[10]  Akshara Rai,et al.  Learning Generalizable Locomotion Skills with Hierarchical Reinforcement Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Björn Schuller,et al.  Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition , 2019, IEEE Transactions on Affective Computing.

[12]  Joohyung Kim,et al.  Trajectory-based Probabilistic Policy Gradient for Learning Locomotion Behaviors , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[13]  Cecilia Laschi,et al.  Model-Based Reinforcement Learning for Closed-Loop Dynamic Control of Soft Robotic Manipulators , 2019, IEEE Transactions on Robotics.

[14]  Saeid Nahavandi,et al.  Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications , 2018, IEEE Transactions on Cybernetics.

[15]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[16]  Sergey Levine,et al.  Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.

[17]  Hussein A. Abbass,et al.  Hierarchical Deep Reinforcement Learning for Continuous Action Control , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[19]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[20]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[21]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[22]  Bo Dai,et al.  Contrastive Learning for Image Captioning , 2017, NIPS.

[23]  Giovanni De Magistris,et al.  Deep reinforcement learning for high precision assembly tasks , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[25]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[26]  Lillian J. Ratliff,et al.  Inverse Risk-Sensitive Reinforcement Learning , 2017, IEEE Transactions on Automatic Control.

[27]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[28]  Yuval Tassa,et al.  Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[29]  Kyungjae Lee,et al.  Gaussian random paths for real-time motion planning , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[31]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Goldie Nejat,et al.  A Learning-Based Semi-Autonomous Controller for Robotic Exploration of Unknown Disaster Scenes While Searching for Victims , 2014, IEEE Transactions on Cybernetics.

[33]  Stefan Schaal,et al.  STOMP: Stochastic trajectory optimization for motion planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[34]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[35]  Diederik P. Kingma Variational inference & deep learning: A new synthesis , 2017 .

[36]  N. Bambos,et al.  Infinite time horizon maximum causal entropy inverse reinforcement learning , 2014, 53rd IEEE Conference on Decision and Control.

[37]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[38]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..