论文信息 - Concurrent Hierarchical Reinforcement Learning for RoboCup Keepaway - 字舞流文

Concurrent Hierarchical Reinforcement Learning for RoboCup Keepaway

RoboCup Keepaway, originated from the RoboCup soccer simulation 2D challenge, has been widely used as a machine learning benchmark. In this paper, we present a concurrent hierarchical reinforcement learning approach to RoboCup Keepaway. Following the idea of hierarchies of abstract machines (HAMs), we write a partial policy as a HAM from the perspective of a single keeper, run multiple instances of the HAM, and use reinforcement learning to learn the optimal completion of the resulting joint HAM. Furthermore, we apply the idea of exploiting the intrinsic internal transitions within the HAM structure for more efficient learning. Experimental results confirm that the concurrent HAM approaches outperform the state of the art significantly on the very complex RoboCup Keepaway domain.

Xiaoping Chen | Stuart J. Russell | Aijun Bai | Aijun Bai | Xiaoping Chen

[1] Edward F. Moore,et al. Gedanken-Experiments on Sequential Machines , 1956 .

[2] Peter Stone,et al. Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[3] Hiroaki Kitano,et al. The RoboCup Synthetic Agent Challenge 97 , 1997, IJCAI.

[4] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[5] Stuart J. Russell,et al. Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions , 2017, IJCAI.

[6] Peter Stone,et al. Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[7] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[8] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[9] David Andre,et al. Programmable Reinforcement Learning Agents , 2000, NIPS.

[10] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[11] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[12] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[15] Bhaskara Marthi,et al. Concurrent Hierarchical Reinforcement Learning , 2005, IJCAI.

[16] Xiaoping Chen,et al. Online Planning for Large Markov Decision Processes with Hierarchical Decomposition , 2015, ACM Trans. Intell. Syst. Technol..

[17] Peter Stone,et al. Learning Complementary Multiagent Behaviors: A Case Study , 2009, RoboCup.

[18] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[19] Peter Stone,et al. Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study , 2006, RoboCup.