论文信息 - Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space

Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space

In this paper we propose a hybrid architecture of actor-critic algorithms for reinforcement learning in parameterized action space, which consists of multiple parallel sub-actor networks to decompose the structured action space into simpler action spaces along with a critic network to guide the training of all sub-actor networks. While this paper is mainly focused on parameterized action space, the proposed architecture, which we call hybrid actor-critic, can be extended for more general action spaces which has a hierarchical structure. We present an instance of the hybrid actor-critic architecture based on proximal policy optimization (PPO), which we refer to as hybrid proximal policy optimization (H-PPO). Our experiments test H-PPO on a collection of tasks with parameterized action space, where H-PPO demonstrates superior performance over previous methods of parameterized action reinforcement learning.

[1] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[2] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[3] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[4] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[5] Tapani Raiko,et al. International Conference on Learning Representations (ICLR) , 2016 .

[6] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[7] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[8] Matthew Hausknecht and Peter Stone,et al. Half Field Offense: An Environment for Multiagent Learning and Ad Hoc Teamwork , 2016 .

[9] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[10] Peter Stone,et al. Function Approximation via Tile Coding: Automating Parameter Choice , 2005, SARA.

[11] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[13] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[14] Qing Wang,et al. Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space , 2018, ArXiv.

[15] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[16] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.