Glue: Enhancing Compatibility and Flexibility of Reinforcement Learning Platforms by Decoupling Algorithms and Environments

Reinforcement Learning (RL) platforms play an important role in translating the rapid advances of RL algorithms into the successes of real-world tasks. These platforms integrate multiple simulation environments, allowing testing, evaluating and finally applying RL algorithms in different scenarios. However, the algorithm code is required to execute in the same runtime system with the underlying environments, which limits platforms’ compatibility when adapting an algorithm and flexibility when switching between different algorithms. We propose GLUE to resolve this issue, by decoupling the executions of algorithms and environments first, then leveraging the RPC protocol to orchestrate a seamless workflow between them. GLUE is further implemented as a library, which hides the handling of language-specific RPCs from users. We evaluate GLUE by adapting 6 RL algorithm implementations to a representative RL platform. Compared with the baseline approach, GLUE enables algorithms to achieve competitive performance, but reduces lines of algorithm code to be changed in adaption by 27.77% , at the cost of 5.40% longer training time, on average.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[3]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[4]  Marc G. Bellemare,et al.  Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.

[5]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[6]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[7]  Jane X. Wang,et al.  Reinforcement Learning, Fast and Slow , 2019, Trends in Cognitive Sciences.

[8]  Stefano Ghidoni,et al.  Robot Task Planning via Deep Reinforcement Learning: a Tabletop Object Sorting Application , 2019, 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC).

[9]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Sun Microsystems,et al.  RPC: Remote Procedure Call Protocol specification: Version 2 , 1988, RFC.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Alejandro Hernández Cordero,et al.  Extending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS and Gazebo , 2016, ArXiv.

[13]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[14]  Ameet V Joshi,et al.  Amazon’s Machine Learning Toolkit: Sagemaker , 2019, Machine Learning and Artificial Intelligence.

[15]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[16]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[17]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[18]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[19]  Lander Usategui San Juan,et al.  gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo , 2019, ArXiv.

[20]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.