MazeBase: A Sandbox for Learning from Games

This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning. Within it, we create 10 simple games embodying a range of algorithmic tasks (e.g. if-then statements or set negation). A variety of neural models (fully connected, convolutional network, memory network) are deployed via reinforcement learning on these games, with and without a procedurally generated curriculum. Despite the tasks' simplicity, the performance of the models is far from optimal, suggesting directions for future development. We also demonstrate the versatility of MazeBase by using it to emulate small combat scenarios from StarCraft. Models trained on the MazeBase version can be directly applied to StarCraft, where they consistently beat the in-game AI.

[1]  Bruno Bouzy,et al.  Computer Go: An AI oriented survey , 2001, Artif. Intell..

[2]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[3]  Takeshi Ito,et al.  Monte-Carlo tree search in Ms. Pac-Man , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[4]  Gabriel Synnaeve,et al.  A Bayesian model for RTS units control applied to StarCraft , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[5]  Julian Togelius,et al.  The 2010 Mario AI Championship: Level Generation Track , 2011, IEEE Transactions on Computational Intelligence and AI in Games.

[6]  Ian D. Watson,et al.  Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft:Broodwar , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[7]  Michael Buro,et al.  Fast Heuristic Search for RTS Game Combat Scenarios , 2012, AIIDE.

[8]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[9]  Santiago Ontañón,et al.  A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[10]  Santiago Ontañón,et al.  The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games , 2013, AIIDE.

[11]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[12]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[13]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[14]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[15]  Balaraman Ravindran,et al.  ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources , 2015, ArXiv.

[16]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[17]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[18]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines , 2015, ArXiv.

[19]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[20]  Tomas Mikolov,et al.  A Roadmap Towards Machine Intelligence , 2015, CICLing.

[21]  Wojciech Zaremba,et al.  Learning Simple Algorithms from Examples , 2015, ICML.

[22]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[23]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[24]  Tapani Raiko,et al.  International Conference on Learning Representations (ICLR) , 2016 .