Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions

While the history of machine learning so far largely encompasses a series of problems posed by researchers and algorithms that learn their solutions, an important question is whether the problems themselves can be generated by the algorithm at the same time as they are being solved. Such a process would in effect build its own diverse and expanding curricula, and the solutions to problems at various stages would become stepping stones towards solving even more challenging problems later in the process. The Paired Open-Ended Trailblazer (POET) algorithm introduced in this paper does just that: it pairs the generation of environmental challenges and the optimization of agents to solve those challenges. It simultaneously explores many different paths through the space of possible problems and solutions and, critically, allows these stepping-stone solutions to transfer between problems if better, catalyzing innovation. The term open-ended signifies the intriguing potential for algorithms like POET to continue to create novel and increasingly complex capabilities without bound. Our results show that POET produces a diverse range of sophisticated behaviors that solve a wide range of environmental challenges, many of which cannot be solved by direct optimization alone, or even through a direct-path curriculum-building control algorithm introduced to highlight the critical role of open-endedness in solving ambitious challenges. The ability to transfer solutions from one environment to another proves essential to unlocking the full potential of the system as a whole, demonstrating the unpredictable nature of fortuitous stepping stones. We hope that POET will inspire a new push towards open-ended discovery across many domains, where algorithms like POET can blaze a trail through their interesting possible manifestations and solutions.

[1]  Thomas S. Ray,et al.  An Approach to the Synthesis of Life , 1991 .

[2]  Risto Miikkulainen,et al.  Incremental Evolution of Complex General Behavior , 1997, Adapt. Behav..

[3]  J. Pollack,et al.  Challenges in coevolutionary learning: arms-race dynamics, open-endedness, and medicocre stable states , 1998 .

[4]  R. Paul Wiegand,et al.  An empirical analysis of collaboration methods in cooperative coevolutionary algorithms , 2001 .

[5]  Russell K. Standish,et al.  Open-Ended Artificial Evolution , 2002, Int. J. Comput. Intell. Appl..

[6]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7]  William B. Langdon,et al.  Pfeiffer - A Distributed Open-ended Evolutionary System , 2005 .

[8]  Kenneth O. Stanley,et al.  Compositional Pattern Producing Networks : A Novel Abstraction of Development , 2007 .

[9]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[10]  Kenneth O. Stanley,et al.  Exploiting Open-Endedness to Solve Problems Through the Search for Novelty , 2008, ALIFE.

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Charles Ofria,et al.  Evolving coordinated quadruped gaits with the HyperNEAT generative encoding , 2009, 2009 IEEE Congress on Evolutionary Computation.

[13]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[14]  Kenneth O. Stanley A Hypercube-Based Indirect Encoding for Evolving Large-Scale Neural Networks , 2009 .

[15]  Markus Olhofer,et al.  Towards Directed Open-Ended Search by a Novelty Guided Evolution Strategy , 2010, PPSN.

[16]  Kenneth O. Stanley,et al.  Revising the evolutionary computation abstraction: minimal criteria novelty search , 2010, GECCO '10.

[17]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[18]  Kenneth O. Stanley,et al.  Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[19]  Kenneth O. Stanley,et al.  Novelty Search and the Problem with Objectives , 2011 .

[20]  Hod Lipson,et al.  Evolving robot gaits in hardware: the HyperNEAT generative encoding vs. parameter optimization , 2011, ECAL.

[21]  Kenneth O. Stanley,et al.  Evolving a diversity of virtual creatures through novelty search and local competition , 2011, GECCO '11.

[22]  Julian Togelius,et al.  Search-Based Procedural Content Generation: A Taxonomy and Survey , 2011, IEEE Transactions on Computational Intelligence and AI in Games.

[23]  Kenneth O. Stanley,et al.  On the Performance of Indirect Encoding Across the Continuum of Regularity , 2011, IEEE Transactions on Evolutionary Computation.

[24]  Michiel van de Panne,et al.  Curriculum Learning for Motor Skills , 2012, Canadian Conference on AI.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[27]  Hod Lipson,et al.  Unshackling evolution: evolving soft robots with multiple materials and a powerful generative encoding , 2013, GECCO '13.

[28]  Kenneth O. Stanley,et al.  Identifying Necessary Conditions for Open-Ended Evolution through the Artificial Life World of Chromaria , 2014, ALIFE.

[29]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[30]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[31]  Jean-Baptiste Mouret,et al.  Illuminating search spaces by mapping elites , 2015, ArXiv.

[32]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[33]  Kenneth O. Stanley,et al.  Why Greatness Cannot Be Planned , 2015, Springer International Publishing.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[37]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[38]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[39]  A Nguyen,et al.  Understanding Innovation Engines: Automated Creativity and Improved Stochastic Optimization via Deep Learning , 2016, Evolutionary Computation.

[40]  Kenneth O. Stanley,et al.  Open-Ended Evolution: Perspectives from the OEE Workshop in York , 2016, Artificial Life.

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[43]  Jean-Baptiste Mouret,et al.  Does Aligning Phenotypic and Genotypic Modularity Improve the Evolution of Neural Networks? , 2016, GECCO.

[44]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[45]  Kenneth O. Stanley,et al.  How the Strictness of the Minimal Criterion Impacts Open-Ended Evolution , 2016 .

[46]  Kenneth O. Stanley,et al.  Quality Diversity: A New Frontier for Evolutionary Computation , 2016, Front. Robot. AI.

[47]  Julian Togelius,et al.  Procedural Content Generation in Games , 2016, Computational Synthesis and Creative Systems.

[48]  David Ha,et al.  Evolving Stable Strategies , 2017 .

[49]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[50]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[51]  Kenneth O. Stanley,et al.  On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent , 2017, ArXiv.

[52]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[53]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[54]  Kenneth O. Stanley,et al.  Minimal criterion coevolution: a new approach to open-ended search , 2017, GECCO.

[55]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[56]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[57]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[58]  Kenneth O. Stanley,et al.  Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.

[59]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[60]  Julian Togelius,et al.  Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation , 2018, 1806.10729.

[61]  Jordan B. Pollack,et al.  Coevolutionary Neural Population Models , 2018, ALIFE.

[62]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[63]  Kenneth O. Stanley,et al.  ES is more than just a traditional finite-difference approximator , 2017, GECCO.

[64]  Sergey Levine,et al.  Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[65]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[66]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[67]  Julian Togelius,et al.  Procedural Level Generation Improves Generality of Deep Reinforcement Learning , 2018, ArXiv.

[68]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[69]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[70]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[71]  David Ha,et al.  Reinforcement Learning for Improving Agent Design , 2018, Artificial Life.

[72]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[73]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[74]  Jeff Clune,et al.  Evolving Multimodal Robot Behavior via Many Stepping Stones with the Combinatorial Multiobjective Evolutionary Algorithm , 2018, Evolutionary Computation.