Superstition in the Network: Deep Reinforcement Learning Plays Deceptive Games

Deep reinforcement learning has learned to play many games well, but failed on others. To better characterize the modes and reasons of failure of deep reinforcement learners, we test the widely used Asynchronous Actor-Critic (A2C) algorithm on four deceptive games, which are specially designed to provide challenges to game-playing agents. These games are implemented in the General Video Game AI framework, which allows us to compare the behavior of reinforcement learning-based agents with planning agents based on tree search. We find that several of these games reliably deceive deep reinforcement learners, and that the resulting behavior highlights the shortcomings of the learning algorithm. The particular ways in which agents fail differ from how planning-based agents fail, further illuminating the character of these algorithms. We propose an initial typology of deceptions which could help us better understand pitfalls and failure modes of (deep) reinforcement learning.

[1]  E B Ebbesen,et al.  Cognitive and attentional mechanisms in delay of gratification. , 1972, Journal of personality and social psychology.

[2]  Risto Miikkulainen,et al.  General Video Game Playing , 2013, Artificial and Computational Intelligence in Games.

[3]  Julian Togelius,et al.  Towards a Video Game Description Language , 2013, Artificial and Computational Intelligence in Games.

[4]  Julian Togelius,et al.  General Video Game AI: Competition, Challenges and Opportunities , 2016, AAAI.

[5]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[6]  Julian Togelius,et al.  Deceptive Games , 2018, EvoApplications.

[8]  Douglas Wilson,et al.  Now it's personal: on abusive game design , 2010, Future Play.

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  B. Skinner Superstition in the pigeon. , 1948, Journal of experimental psychology.

[11]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[12]  Kenneth O. Stanley,et al.  Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.

[13]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[14]  David Hume A Treatise of Human Nature: Being an Attempt to introduce the experimental Method of Reasoning into Moral Subjects , 1972 .

[15]  Julian Togelius,et al.  Ontogenetic and Phylogenetic Reinforcement Learning , 2009, Künstliche Intell..

[16]  Tom Schaul,et al.  A video game description language for model-based or interactive learning , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[17]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[18]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[19]  R. Sutton Introduction: The Challenge of Reinforcement Learning , 1992 .

[20]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[21]  Julian Togelius,et al.  Deep Learning for Video Game Playing , 2017, IEEE Transactions on Games.

[22]  B. Russell,et al.  Problems Of Philosophy , 2004, Synthese.

[23]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[24]  Julian Togelius,et al.  Deep Reinforcement Learning for General Video Game AI , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[25]  Julian Togelius,et al.  Procedural Level Generation Improves Generality of Deep Reinforcement Learning , 2018, ArXiv.