Effects of Different Optimization Formulations in Evolutionary Reinforcement Learning on Diverse Behavior Generation

Generating various strategies for a given task is challenging. However, it has already proven to bring many assets to the main learning process, such as improved behavior exploration. With the growth in the interest of heterogeneity in solution in evolutionary computation and reinforcement learning, many promising approaches have emerged. To better understand how one guides multiple policies toward distinct strategies and benefit from diversity, we need to analyze further the influence of the reward signal modulation and other evolutionary mechanisms on the obtained behaviors. To that effect, this paper considers an existing evolutionary reinforcement learning framework which exploits multi-objective optimization as a way to obtain policies that succeed at behavior-related tasks as well as completing the main goal. Experiments on the Atari games stress that optimization formulations which do not consider objectives equally fail at generating diversity and even output agents that are worse at solving the problem at hand, regardless of the obtained behaviors.

[1]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[2]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[3]  Daniele Gravina,et al.  Surprise Search for Evolutionary Divergence , 2017, ArXiv.

[4]  Kenneth O. Stanley,et al.  Quality Diversity: A New Frontier for Evolutionary Computation , 2016, Front. Robot. AI.

[5]  Zhang-Wei Hong,et al.  Diversity-Driven Exploration Strategy for Deep Reinforcement Learning , 2018, NeurIPS.

[6]  Lothar Thiele,et al.  Multiobjective Optimization Using Evolutionary Algorithms - A Comparative Case Study , 1998, PPSN.

[7]  Yan Zheng,et al.  Generating Behavior-Diverse Game AIs with Evolutionary Multi-Objective Deep Reinforcement Learning , 2020, IJCAI.

[8]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[9]  Hisao Ishibuchi,et al.  Evolutionary many-objective optimization: A short review , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[10]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[11]  Krzysztof Choromanski,et al.  Effective Diversity in Population-Based Reinforcement Learning , 2020, NeurIPS.

[12]  Hisao Ishibuchi,et al.  Mobile Robot Controller Design by Evolutionary Multiobjective Optimization in Multiagent Environments , 2011, ICIRA.

[13]  Lei Ma,et al.  Wuji: Automatic Online Combat Game Testing Using Evolutionary Deep Reinforcement Learning , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[14]  Kenneth O. Stanley,et al.  Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[15]  Kenneth O. Stanley,et al.  Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.

[16]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[17]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[18]  Daniele Gravina,et al.  Constrained surprise search for content generation , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[19]  Daniele Gravina,et al.  Quality Diversity Through Surprise , 2018, IEEE Transactions on Evolutionary Computation.

[20]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.