Monolithic vs. hybrid controller for multi-objective Sim-to-Real learning

Simulation to real (Sim-to-Real) is an attractive approach to construct controllers for robotic tasks that are easier to simulate than to analytically solve. Working Sim-to-Real solutions have been demonstrated for tasks with a clear single objective such as "reach the target". Real world applications, however, often consist of multiple simultaneous objectives such as "reach the target" but "avoid obstacles". A straightforward solution in the context of reinforcement learning (RL) is to combine multiple objectives into a multi-term reward function and train a single monolithic controller. Recently, a hybrid solution based on pre-trained single objective controllers and a switching rule between them was proposed. In this work, we compare these two approaches in the multi-objective setting of a robot manipulator to reach a target while avoiding an obstacle. Our findings show that the training of a hybrid controller is easier and obtains a better success-failure trade-off than a monolithic controller. The controllers trained in simulator were verified by a real set-up.

[1]  Hong Zhang,et al.  Path Planning of Humanoid Arm Based on Deep Deterministic Policy Gradient , 2018, 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[2]  Edwin Olson,et al.  AprilTag 2: Efficient and robust fiducial detection , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Andrew J. Davison,et al.  Sim-to-Real Reinforcement Learning for Deformable Object Manipulation , 2018, CoRL.

[4]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[5]  Dewen Hu,et al.  Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[8]  Minna Lanz,et al.  Proof of concept of a projection-based safety system for human-robot collaborative engine assembly , 2019, 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN).

[9]  Andrew J. Davison,et al.  PyRep: Bringing V-REP to Deep Robot Learning , 2019, ArXiv.

[10]  Xi Chen,et al.  Meta-Learning for Multi-objective Reinforcement Learning , 2018, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Kefang Zhang,et al.  Sim2Real Learning of Vision-Based Obstacle Avoidance for Robotic Manipulators , 2020 .

[12]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[13]  Wojciech Matusik,et al.  Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control , 2020, ICML.

[14]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[15]  Pierre-Yves Oudeyer,et al.  Sim-to-Real Transfer with Neural-Augmented Robot Simulation , 2018, CoRL.

[16]  Antonella Ferrara,et al.  Deep Reinforcement Learning for Collision Avoidance of Robotic Manipulators , 2018, 2018 European Control Conference (ECC).

[17]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Sha Luo,et al.  Accelerating Reinforcement Learning for Reaching Using Continuous Curriculum Learning , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[19]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[20]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[21]  Sriraam Natarajan,et al.  Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[22]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[23]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[24]  Minna Lanz,et al.  Review of vision-based safety systems for human-robot collaboration , 2018 .

[25]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[26]  Surya P. N. Singh,et al.  V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Giovanni De Magistris,et al.  OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[29]  Liang Ma,et al.  Trial and Error Experience Replay Based Deep Reinforcement Learning , 2019, 2019 IEEE International Conference on Smart Cloud (SmartCloud).