Hierarchical Advantage for Reinforcement Learning in Parameterized Action Space

We propose a hierarchical architecture for the advantage function to improve the performance of reinforcement learning in parameterized action space, which consists of a set of discrete actions and a set of continuous parameters corresponding to each discrete action. The hierarchical architecture extends the actor-critic architecture with two specialized advantage functions, one for discrete actions and the other for continuous parameters, to estimate a better baseline. We incorporate this architecture into proximal policy optimization, which is referred to as HA-PPO. We evaluated all of our methods on the Half Field Offense domain, and found that the hierarchical architecture of the advantage function, which is referred to as the hierarchical advantage, helps to stabilize the learning and leads to a better performance.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Eloi Alonso,et al.  Discrete and Continuous Action Representation for Practical RL in Video Games , 2019, ArXiv.

[3]  Qing Wang,et al.  Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space , 2018, ArXiv.

[4]  Yong Yu,et al.  Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space , 2019, IJCAI.

[5]  Drew Wicke,et al.  Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space , 2018, AAAI Spring Symposia.

[6]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[7]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[8]  Matthew Hausknecht and Peter Stone,et al.  Half Field Offense: An Environment for Multiagent Learning and Ad Hoc Teamwork , 2016 .

[9]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[10]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Peter Stone,et al.  Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[13]  Tomoharu Nakashima,et al.  HELIOS Base: An Open Source Package for the RoboCup Soccer 2D Simulation , 2013, RoboCup.

[14]  Pravesh Ranchod,et al.  Reinforcement Learning with Parameterized Actions , 2015, AAAI.

[15]  Hiroaki Kitano,et al.  RoboCup: The Robot World Cup Initiative , 1997, AGENTS '97.

[16]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[17]  Craig J. Bester,et al.  Multi-Pass Q-Networks for Deep Reinforcement Learning with Parameterised Action Spaces , 2019, ArXiv.

[18]  Peter Stone,et al.  Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study , 2006, RoboCup.

[19]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.