Real-Time Navigation in Classical Platform Games via Skill Reuse

In platform videogames, players are frequently tasked with solving medium-term navigation problems in order to gather items or powerups. Arti- ficial agents must generally obtain some form of direct experience before they can solve such tasks. Experience is gained either through training runs, or by exploiting knowledge of the game's physics to generate detailed simulations. Human players, on the other hand, seem to look ahead in high-level, abstract steps. Motivated by human play, we introduce an approach that leverages not only abstract "skills", but also knowledge of what those skills can and cannot achieve. We apply this approach to Infinite Mario, where despite facing randomly generated, maze-like levels, our agent is capable of deriving complex plans in real-time, without relying on perfect knowledge of the game's physics.

[1]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[2]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[3]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[6]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[7]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[8]  Julian Togelius,et al.  Monte Mario: platforming with MCTS , 2014, GECCO.

[9]  Andrew G. Barto,et al.  PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.

[10]  Bruce L. Digney,et al.  Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[11]  Julian Togelius,et al.  The 2009 Mario AI Competition , 2010, IEEE Congress on Evolutionary Computation.

[12]  Luc De Raedt,et al.  Proceedings of the 22nd international conference on Machine learning , 2005 .

[13]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[14]  Julian Togelius,et al.  Mario AI competition , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[15]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[16]  R. Lathe Phd by thesis , 1988, Nature.

[17]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[18]  Alex Graves,et al.  Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[19]  Tapio Elomaa,et al.  Machine Learning: ECML 2002 , 2002, Lecture Notes in Computer Science.

[20]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[21]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[22]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[23]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[24]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[25]  Marc Toussaint,et al.  Hierarchical Monte-Carlo Planning , 2015, AAAI.