论文信息 - Monolithic vs. hybrid controller for multi-objective Sim-to-Real learning

Monolithic vs. hybrid controller for multi-objective Sim-to-Real learning

Simulation to real (Sim-to-Real) is an attractive approach to construct controllers for robotic tasks that are easier to simulate than to analytically solve. Working Sim-to-Real solutions have been demonstrated for tasks with a clear single objective such as "reach the target". Real world applications, however, often consist of multiple simultaneous objectives such as "reach the target" but "avoid obstacles". A straightforward solution in the context of reinforcement learning (RL) is to combine multiple objectives into a multi-term reward function and train a single monolithic controller. Recently, a hybrid solution based on pre-trained single objective controllers and a switching rule between them was proposed. In this work, we compare these two approaches in the multi-objective setting of a robot manipulator to reach a target while avoiding an obstacle. Our findings show that the training of a hybrid controller is easier and obtains a better success-failure trade-off than a monolithic controller. The controllers trained in simulator were verified by a real set-up.

[1] Hong Zhang,et al. Path Planning of Humanoid Arm Based on Deep Deterministic Policy Gradient , 2018, 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[2] Edwin Olson,et al. AprilTag 2: Efficient and robust fiducial detection , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3] Andrew J. Davison,et al. Sim-to-Real Reinforcement Learning for Deformable Object Manipulation , 2018, CoRL.

[4] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[5] Dewen Hu,et al. Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] Morgan Quigley,et al. ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[8] Minna Lanz,et al. Proof of concept of a projection-based safety system for human-robot collaborative engine assembly , 2019, 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN).

[9] Andrew J. Davison,et al. PyRep: Bringing V-REP to Deep Robot Learning , 2019, ArXiv.

[10] Xi Chen,et al. Meta-Learning for Multi-objective Reinforcement Learning , 2018, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11] Kefang Zhang,et al. Sim2Real Learning of Vision-Based Obstacle Avoidance for Robotic Manipulators , 2020 .

[12] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[13] Wojciech Matusik,et al. Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control , 2020, ICML.

[14] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[15] Pierre-Yves Oudeyer,et al. Sim-to-Real Transfer with Neural-Augmented Robot Simulation , 2018, CoRL.

[16] Antonella Ferrara,et al. Deep Reinforcement Learning for Collision Avoidance of Robotic Manipulators , 2018, 2018 European Control Conference (ECC).

[17] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18] Sha Luo,et al. Accelerating Reinforcement Learning for Reaching Using Continuous Curriculum Learning , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[19] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[20] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[21] Sriraam Natarajan,et al. Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[22] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[23] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[24] Minna Lanz,et al. Review of vision-based safety systems for human-robot collaboration , 2018 .

[25] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[26] Surya P. N. Singh,et al. V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27] Giovanni De Magistris,et al. OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[28] Razvan Pascanu,et al. Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[29] Liang Ma,et al. Trial and Error Experience Replay Based Deep Reinforcement Learning , 2019, 2019 IEEE International Conference on Smart Cloud (SmartCloud).