Efficient Searching With MCTS and Imitation Learning: A Case Study in Pommerman

Pommerman is a popular reinforcement learning environment because it imposes several challenges such as sparse and deceptive rewards and delayed action effects. In this paper, we propose an efficient reinforcement learning approach that uses a more efficient Monte Carlo tree search combined with action pruning and flexible imitation learning to accelerate the search performance, allowing the agent to avoid meaningless explorations and find some high-level strategies. Under the Pommerman benchmark, we evaluate the agent driven by the proposed approach against the heuristic and pure reinforcement learning baselines, and the results show that our method can yield a relatively high-level agent performance during combat, which demonstrates the efficiency of our method in this specific domain and its potential ability.

[1]  Takayuki Osogami,et al.  Real-time tree search with pessimistic scenarios , 2019, ArXiv.

[2]  Daniel Kudenko,et al.  Deep Multi-Agent Reinforcement Learning with Relevance Graphs , 2018, ArXiv.

[3]  Simon M. Lucas,et al.  Analysis of Statistical Forward Planning Methods in Pommerman , 2019, AIIDE.

[4]  Dhruv Shah,et al.  Multi-Agent Strategies for Pommerman , 2018 .

[5]  Joan Bruna,et al.  Backplay: "Man muss immer umkehren" , 2018, ArXiv.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[8]  Chao Gao,et al.  Continual Match Based Training in Pommerman: Technical Report , 2018, ArXiv.

[9]  Matthew E. Taylor,et al.  Safer Deep RL with Shallow MCTS: A Case Study in Pommerman , 2019, ArXiv.

[10]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[11]  Matthew E. Taylor,et al.  Agent Modeling as Auxiliary Task for Deep Reinforcement Learning , 2019, AIIDE.

[12]  Matthew E. Taylor,et al.  Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL , 2018, ArXiv.

[13]  Matthew E. Taylor,et al.  Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning , 2019, AIIDE.

[14]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[15]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[16]  Julian Togelius,et al.  A hybrid search agent in pommerman , 2018, FDG.

[17]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[18]  Chao Gao,et al.  Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition , 2019, ArXiv.

[19]  J.F.A.K. van Benthem,et al.  Man Muss Immer Umkehren , 2008 .

[20]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.