Parallel Gym Gazebo: a Scalable Parallel Robot Deep Reinforcement Learning Platform

Deep reinforcement learning is making advances in robotics with the platforms of realistic environment simulation. However, as shown in this paper, the realistic simulation introduces vast time cost which is the bottleneck of the learning procedure. To solve this problem generally, we propose a parallel reinforcement learning platform which follows the master-slave principle and integrates learning programs with multiple distributedrobot simulators. The platform is intrinsically scalable and requires no modification to existing serially designed learning environments or algorithms. Experimental results demonstrate that our platform significantly accelerates the learning progress of robots, in direct proportion to the parallel scale. The parallelism also brings richer exploration and sampling, enhancing the performance of deep reinforcement learning algorithms compared with existing serial platforms.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[3]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[4]  Hriday Bavle,et al.  A Deep Reinforcement Learning Technique for Vision-Based Autonomous Multirotor Landing on a Moving Platform , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[6]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[8]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[9]  Alejandro Hernández Cordero,et al.  Extending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS and Gazebo , 2016, ArXiv.

[10]  Hao Zhang,et al.  Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Wenjing Yang,et al.  Multi-feature Fusion for Deep Reinforcement Learning: Sequential Control of Mobile Robots , 2018, ICONIP.

[12]  Li-Chen Fu,et al.  Distributed Deep Reinforcement Learning based Indoor Visual Navigation , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[14]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[15]  Lei Zhang,et al.  Learning to Cooperate via an Attention-Based Communication Neural Network in Decentralized Multi-Robot Exploration † , 2019, Entropy.

[16]  Brian Tanner,et al.  RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..

[17]  Alborz Geramifard,et al.  RLPy: a value-function-based reinforcement learning framework for education and research , 2015, J. Mach. Learn. Res..

[18]  Jinshu Su,et al.  Dynamic Edge Computation Offloading for Internet of Things With Energy Harvesting: A Learning Method , 2019, IEEE Internet of Things Journal.

[19]  James Davidson,et al.  TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow , 2017, ArXiv.

[20]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).