Solving Interleaved and Blended Sequential Decision-Making Problems through Modular Neuroevolution

Many challenging sequential decision-making problems require agents to master multiple tasks, such as defense and offense in many games. Learning algorithms thus benefit from having separate policies for these tasks, and from knowing when each one is appropriate. How well the methods work depends on the nature of the tasks: Interleaved tasks are disjoint and have different semantics, whereas blended tasks have regions where semantics from different tasks overlap. While many methods work well in interleaved tasks, blended tasks are difficult for methods with strict, human-specified task divisions, such as Multitask Learning. In such problems, task divisions should be discovered automatically. To demonstrate the power of this approach, the MM-NEAT neuroevolution framework is applied in this paper to two variants of the challenging video game of Ms. Pac-Man. In the simplified interleaved version of the game, the results demonstrate when and why such machine-discovered task divisions are useful. In the standard blended version of the game, a surprising, highly effective machine-discovered task division surpasses human-specified divisions, achieving the best scores to date in this game. Modular neuroevolution is thus a promising technique for discovering multimodal behavior for challenging real-world tasks.

[1]  Simon M. Lucas,et al.  Ms Pac-Man versus Ghost Team CEC 2011 competition , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[2]  Una-May O'Reilly,et al.  Genetic Programming II: Automatic Discovery of Reusable Programs. , 1994, Artificial Life.

[3]  Samad Ahmadi,et al.  Reactive control of Ms. Pac Man using information retrieval based on Genetic Programming , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[4]  Julian Togelius,et al.  Evolution of a subsumption architecture neurocontroller , 2004, J. Intell. Fuzzy Syst..

[5]  Manuela M. Veloso,et al.  Layered Learning , 2000, ECML.

[6]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[7]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[8]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Kenneth O. Stanley,et al.  Constraining connectivity to encourage modularity in HyperNEAT , 2011, GECCO '11.

[11]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[12]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[13]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[14]  Risto Miikkulainen,et al.  Real-Time Evolution of Neural Networks in the NERO Video Game , 2006, AAAI.

[15]  Simon M. Lucas,et al.  Using genetic programming to evolve heuristics for a Monte Carlo Tree Search Ms Pac-Man agent , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[16]  Risto Miikkulainen,et al.  Evolving Multimodal Networks for Multitask Games , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[17]  Stefano Nolfi,et al.  Duplication of Modules Facilitates the Evolution of Functional Specialization , 1999, Artificial Life.

[18]  Hod Lipson,et al.  The evolutionary origins of modularity , 2012, Proceedings of the Royal Society B: Biological Sciences.

[19]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[20]  Risto Miikkulainen,et al.  Evolving neural networks for strategic decision-making problems , 2009, Neural Networks.

[21]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[22]  Jean-Baptiste Mouret,et al.  Evolving neural networks that are both modular and regular: HyperNEAT plus the connection cost technique , 2014, GECCO.

[23]  Risto Miikkulainen,et al.  Open-ended behavioral complexity for evolved virtual creatures , 2013, GECCO '13.

[24]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[25]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[26]  Simon M. Lucas,et al.  Evolving diverse Ms. Pac-Man playing agents using genetic programming , 2010, 2010 UK Workshop on Computational Intelligence (UKCI).

[27]  César Estébanez,et al.  AntBot: Ant Colonies for Video Games , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[28]  Stéphane Doncieux,et al.  MENNAG: a modular, regular and hierarchical encoding for neural-networks based on attribute grammars , 2008, Evol. Intell..

[29]  U. Alon,et al.  Spontaneous evolution of modularity and network motifs. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  John Levine,et al.  Improving control through subsumption in the EvoTanks domain , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[31]  Julian Togelius,et al.  Hierarchical controller learning in a First-Person Shooter , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[32]  Justinian Rosca,et al.  Generality versus size in genetic programming , 1996 .

[33]  Richard A. Watson,et al.  Reducing Local Optima in Single-Objective Problems by Multi-objectivization , 2001, EMO.

[34]  Risto Miikkulainen,et al.  Evolving multimodal behavior with modular neural networks in Ms. Pac-Man , 2014, GECCO.