Evolutionary Computation for Reinforcement Learning

Algorithms for evolutionary computation, which simulate the process of natural selection to solve optimization problems, are an effective tool for discovering high-performing reinforcement-learning policies. Because they can automatically find good representations, handle continuous action spaces, and cope with partial observability, evolutionary reinforcement-learning approaches have a strong empirical track record, sometimes significantly outperforming temporal-difference methods. This chapter surveys research on the application of evolutionary computation to reinforcement learning, overviewing methods for evolving neural-network topologies and weights, hybrid methods that also use temporal-difference methods, coevolutionary methods for multi-agent settings, generative and developmental systems, and methods for on-line evolutionary reinforcement learning.

[1]  Alexander Zelinsky,et al.  Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[2]  Stewart W. Wilson Function approximation with a classifier system , 2001 .

[3]  Peter Nordin,et al.  An On-Line Method to Evolve Behavior and to Control a Miniature Robot in Real Time with Genetic Programming , 1996, Adapt. Behav..

[4]  Richard K. Belew,et al.  New Methods for Competitive Coevolution , 1997, Evolutionary Computation.

[5]  Shimon Whiteson,et al.  Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning , 2010, Autonomous Agents and Multi-Agent Systems.

[6]  Samir W. Mahfoud A Comparison of Parallel and Sequential Niching Methods , 1995, ICGA.

[7]  MSc PhD Tim Kovacs BA Strength or Accuracy: Credit Assignment in Learning Classifier Systems , 2004, Distinguished Dissertations.

[8]  Ida G. Sprinkhuizen-Kuyper,et al.  Evolving Artificial Neural Networks using the "Baldwin Effect" † , 1995 .

[9]  Ida G. Sprinkhuizen-Kuyper,et al.  Evolving Neural Networks Using the "Baldwin Effect" , 1995, ICANNGA.

[10]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[11]  L. Darrell Whitley,et al.  Genetic Reinforcement Learning for Neurocontrol Problems , 2004, Machine Learning.

[12]  Diego Calvanese,et al.  Unifying Class-Based Representation Formalisms , 2011, J. Artif. Intell. Res..

[13]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[14]  A. P. Wieland,et al.  Evolving neural network controllers for unstable systems , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[15]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[16]  J. Krebs,et al.  Arms races between and within species , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[17]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[18]  Risto Miikkulainen,et al.  Competitive Coevolution through Evolutionary Complexification , 2011, J. Artif. Intell. Res..

[19]  R. French,et al.  Genes, Phenes and the Baldwin Effect: Learning and Evolution in a Simulated Population , 1994 .

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Risto Miikkulainen,et al.  Culling and Teaching in Neuro-Evolution , 1997, ICGA.

[22]  Kenneth O. Stanley,et al.  A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[23]  David E. Goldberg,et al.  Genetic Algorithms with Sharing for Multimodalfunction Optimization , 1987, ICGA.

[24]  D. R. McGregor,et al.  Designing application-specific neural networks using the structured genetic algorithm , 1992, [Proceedings] COGANN-92: International Workshop on Combinations of Genetic Algorithms and Neural Networks.

[25]  Asunción Gómez-Pérez,et al.  The Semantic Web: Research and Applications, Second European Semantic Web Conference, ESWC 2005, Heraklion, Crete, Greece, May 29 - June 1, 2005, Proceedings , 2005, ESWC.

[26]  Jürgen Schmidhuber,et al.  Evolving Modular Fast-Weight Networks for Control , 2005, ICANN.

[27]  Shimon Whiteson,et al.  Comparing evolutionary and temporal difference methods in a reinforcement learning domain , 2006, GECCO.

[28]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[29]  Geoffrey E. Hinton,et al.  How Learning Can Guide Evolution , 1996, Complex Syst..

[30]  Peter M. Todd,et al.  Parental Guidance Suggested: How Parental Imprinting Evolves Through Sexual Selection as an Adaptive Learning Mechanism , 1993, Adapt. Behav..

[31]  Joel Lehman,et al.  Evolving policy geometry for scalable multiagent learning , 2010, AAMAS.

[32]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[33]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[34]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[35]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[36]  Christophe G. Giraud-Carrier,et al.  Unifying Learning with Evolution Through Baldwinian Evolution and Lamarckism , 2000, Advances in Computational Intelligence and Learning.

[37]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[38]  Takaya Arita,et al.  Interactions between learning and evolution: the outstanding strategy generated by the Baldwin effect. , 2004, Bio Systems.

[39]  John J. Grefenstette,et al.  Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..

[40]  Risto Miikkulainen,et al.  Coevolution of Role-Based Cooperation in Multiagent Systems , 2009, IEEE Transactions on Autonomous Mental Development.

[41]  Risto Miikkulainen,et al.  Evolving Soccer Keepaway Players Through Task Decomposition , 2005, Machine Learning.

[42]  Francesco Mondada,et al.  Evolution of homing navigation in a real mobile robot , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[43]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[44]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[45]  Gerald Tesauro Comments on “Co-Evolution in the Successful Learning of Backgammon Strategy” , 2004, Machine Learning.

[46]  Risto Miikkulainen,et al.  Forming Neural Networks Through Efficient and Adaptive Coevolution , 1997, Evolutionary Computation.

[47]  Edwin D. de Jong,et al.  The Incremental Pareto-Coevolution Archive , 2004, GECCO.

[48]  Thomas Jansen,et al.  The Cooperative Coevolutionary (11) EA , 2004, Evolutionary Computation.

[49]  Keith L. Downing,et al.  Reinforced Genetic Programming , 2001, Genetic Programming and Evolvable Machines.

[50]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[51]  Luc Steels,et al.  Emergent functionality in robotic agents through on-line evolution , 1994 .

[52]  Dirk P. Kroese,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[53]  Grzegorz Rozenberg,et al.  Handbook of Natural Computing , 2011, Springer Berlin Heidelberg.

[54]  Verena Heidrich-Meisner,et al.  Neuroevolution strategies for episodic reinforcement learning , 2009, J. Algorithms.

[55]  Olivier Sigaud,et al.  YACS: a new learning classifier system using anticipation , 2002, Soft Comput..

[56]  A. Lindenmayer Mathematical models for cellular interactions in development. II. Simple and branching filaments with two-sided inputs. , 1968, Journal of theoretical biology.

[57]  Risto Miikkulainen,et al.  Evolving a Roving Eye for Go , 2004, GECCO.

[58]  Moshe Sipper,et al.  Evolving artificial neural networks with FINCH , 2013, GECCO '13 Companion.

[59]  Martin V. Butz,et al.  Sequential problems that test generalization in learning classifier systems , 2009, Evol. Intell..

[60]  Simon M. Lucas,et al.  Temporal Difference Learning Versus Co-Evolution for Acquiring Othello Position Evaluation , 2006, 2006 IEEE Symposium on Computational Intelligence and Games.

[61]  Christian Igel,et al.  Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem , 2008, EWRL.

[62]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[63]  W. Daniel Hillis,et al.  Co-evolving parasites improve simulated evolution as an optimization procedure , 1990 .

[64]  Kagan Tumer,et al.  Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[65]  Dario Floreano,et al.  Evolution of Plastic Control Networks , 2001, Auton. Robots.

[66]  Christian Igel,et al.  Uncertainty handling CMA-ES for reinforcement learning , 2009, GECCO.

[67]  Byoung-Tak Zhang,et al.  Evolving Optimal Neural Networks Using Genetic Algorithms with Occam's Razor , 1993, Complex Syst..

[68]  Jürgen Schmidhuber,et al.  Co-evolving recurrent neurons learn deep memory POMDPs , 2005, GECCO '05.

[69]  Daniele Loiacono,et al.  On-line neuroevolution applied to The Open Racing Car Simulator , 2009, 2009 IEEE Congress on Evolutionary Computation.

[70]  Christian Igel,et al.  Neuroevolution for reinforcement learning using evolution strategies , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[71]  Jordan B. Pollack,et al.  Pareto Optimality in Coevolutionary Learning , 2001, ECAL.

[72]  Jordan B. Pollack,et al.  A Game-Theoretic Approach to the Simple Coevolutionary Algorithm , 2000, PPSN.

[73]  Risto Miikkulainen,et al.  Accelerated Neural Evolution through Cooperatively Coevolved Synapses , 2008, J. Mach. Learn. Res..

[74]  Peter J. Fleming,et al.  On-line evolution of robust control systems: an industrial active magnetic bearing application , 2001 .

[75]  David B. Fogel,et al.  Evolving an expert checkers playing program without using human expertise , 2001, IEEE Trans. Evol. Comput..

[76]  Sean Luke,et al.  Archive-based cooperative coevolutionary algorithms , 2006, GECCO '06.

[77]  Xin Yao,et al.  Evolving artificial neural networks , 1999, Proc. IEEE.

[78]  Kenneth A. De Jong,et al.  Using genetic algorithms for concept learning , 1993, Machine Learning.

[79]  Daniele Loiacono,et al.  Learning to Drive in the Open Racing Car Simulator Using Online Neuroevolution , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[80]  Arthur Tay,et al.  Online adaptive controller for simulated car racing , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[81]  Kagan Tumer,et al.  Efficient Evaluation Functions for Evolving Coordination , 2008, Evolutionary Computation.

[82]  David H. Ackley,et al.  Interactions between learning and evolution , 1991 .

[83]  Risto Miikkulainen,et al.  Evolving Keepaway Soccer Players through Task Decomposition , 2003, GECCO.

[84]  Riccardo Poli,et al.  Genetic and Evolutionary Computation – GECCO 2004 , 2004, Lecture Notes in Computer Science.

[85]  Erkki Oja,et al.  Artificial Neural Networks: Formal Models and Their Applications - ICANN 2005, 15th International Conference, Warsaw, Poland, September 11-15, 2005, Proceedings, Part II , 2005, International Conference on Artificial Neural Networks.

[86]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[87]  Stefano Nolfi,et al.  Learning to Adapt to Changing Environments in Evolving Neural Networks , 1996, Adapt. Behav..

[88]  Jordan B. Pollack,et al.  Creating High-Level Components with a Generative Representation for Body-Brain Evolution , 2002, Artificial Life.

[89]  Simon M. Lucas,et al.  Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go , 2005, IEEE Transactions on Evolutionary Computation.

[90]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[91]  Aristid Lindenmayer,et al.  Mathematical Models for Cellular Interactions in Development , 1968 .

[92]  J. Baldwin A New Factor in Evolution , 1896, The American Naturalist.

[93]  Dario Floreano,et al.  Evolving Vision-Based Flying Robots , 2002, Biologically Motivated Computer Vision.

[94]  John H. Holland,et al.  Cognitive systems based on adaptive algorithms , 1977, SGAR.

[95]  Risto Miikkulainen,et al.  Active Guidance for a Finless Rocket Using Neuroevolution , 2003, GECCO.

[96]  Kenneth O. Stanley,et al.  Autonomous Evolution of Topographic Regularities in Artificial Neural Networks , 2010, Neural Computation.

[97]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[98]  Martin V. Butz,et al.  Anticipatory Learning Classifier Systems and Factored Reinforcement Learning , 2009, ABiALS.

[99]  Kenneth A. De Jong,et al.  An Analysis of the Interacting Roles of Population Size and Crossover in Genetic Algorithms , 1990, PPSN.

[100]  L. Darrell Whitley,et al.  Lamarckian Evolution, The Baldwin Effect and Function Optimization , 1994, PPSN.

[101]  Inman Harvey,et al.  Evolutionary Robotics: A Survey of Applications and Problems , 1998, EvoRobot.

[102]  Risto Miikkulainen,et al.  Evolving neural networks for strategic decision-making problems , 2009, Neural Networks.

[103]  Seong-Whan Lee,et al.  Biologically Motivated Computer Vision , 2002, Lecture Notes in Computer Science.

[104]  R. Paul Wiegand,et al.  An empirical analysis of collaboration methods in cooperative coevolutionary algorithms , 2001 .

[105]  Shimon Whiteson,et al.  On-line evolutionary computation for reinforcement learning in stochastic domains , 2006, GECCO.

[106]  Neil D. Lawrence,et al.  Missing Data in Kernel PCA , 2006, ECML.

[107]  Risto Miikkulainen,et al.  Coevolution of neural networks using a layered pareto archive , 2006, GECCO.

[108]  José del R. Millán,et al.  Continuous-Action Q-Learning , 2002, Machine Learning.

[109]  Martin V. Butz,et al.  Learning sensorimotor control structures with XCSF: redundancy exploitation and dynamic control , 2009, GECCO '09.

[110]  Frédéric Gruau,et al.  Automatic Definition of Modular Neural Networks , 1994, Adapt. Behav..

[111]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[112]  L. Buşoniu Evolutionary function approximation for reinforcement learning , 2006 .

[113]  Jürgen Schmidhuber,et al.  Training Recurrent Networks by Evolino , 2007, Neural Computation.

[114]  Kenneth O. Stanley,et al.  Evolving Static Representations for Task Transfer , 2010, J. Mach. Learn. Res..

[115]  Nicholas J. Radcliffe,et al.  Genetic set recombination and its application to neural network topology optimisation , 1993, Neural Computing & Applications.

[116]  Serge Kernbach,et al.  Evolutionary robotics: The next-generation-platform for on-line and on-board artificial evolution , 2009, 2009 IEEE Congress on Evolutionary Computation.

[117]  Tim Kovacs,et al.  Foundations of learning classifier systems: An introduction , 2005 .

[118]  Zixing Cai,et al.  Cooperative Coevolutionary Adaptive Genetic Algorithm in Path Planning of Cooperative Multi-Mobile Robot Systems , 2002, J. Intell. Robotic Syst..

[119]  Jordan B. Pollack,et al.  Co-Evolution in the Successful Learning of Backgammon Strategy , 1998, Machine Learning.

[120]  Jeffrey L. Elman,et al.  Learning and Evolution in Neural Networks , 1994, Adapt. Behav..

[121]  Kenneth O. Stanley A Hypercube-Based Indirect Encoding for Evolving Large-Scale Neural Networks , 2009 .

[122]  Kenneth O. Stanley,et al.  A Case Study on the Critical Role of Geometric Regularity in Machine Learning , 2008, AAAI.

[123]  Martin V. Butz,et al.  Function Approximation With XCS: Hyperellipsoidal Conditions, Recursive Least Squares, and Compaction , 2008, IEEE Transactions on Evolutionary Computation.

[124]  Kenneth A. De Jong,et al.  Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents , 2000, Evolutionary Computation.

[125]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[126]  Risto Miikkulainen,et al.  Solving Non-Markovian Control Tasks with Neuro-Evolution , 1999, IJCAI.

[127]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[128]  Shimon Whiteson,et al.  Neuroevolutionary reinforcement learning for generalized helicopter control , 2009, GECCO.

[129]  Risto Miikkulainen,et al.  A Taxonomy for Artificial Embryogeny , 2003, Artificial Life.

[130]  Martin V. Butz,et al.  Anticipatory Learning Classifier Systems , 2002, Genetic Algorithms and Evolutionary Computation.

[131]  Dilip Kumar Pratihar,et al.  Evolutionary robotics—A review , 2003 .

[132]  Edwin D de Jong A monotonic archive for pareto-coevolution. , 2007, Evolutionary computation.

[133]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[134]  Jan Drugowitsch Design and Analysis of Learning Classifier Systems - A Probabilistic Approach , 2008, Studies in Computational Intelligence.

[135]  Lihong Li,et al.  Analyzing feature generation for value-function approximation , 2007, ICML '07.

[136]  Larry Bull,et al.  Accuracy-based Neuro And Neuro-fuzzy Classifier Systems , 2002, GECCO.

[137]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[138]  Julian Togelius,et al.  Point-to-Point Car Racing: an Initial Study of Evolution Versus Temporal Difference Learning , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[139]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[140]  Martin V. Butz,et al.  Context-dependent predictions and cognitive arm control with XCSF , 2008, GECCO '08.

[141]  Shimon Whiteson,et al.  The Reinforcement Learning Competitions , 2010 .

[142]  Olivier Sigaud,et al.  Combining latent learning with dynamic programming in the modular anticipatory classifier system , 2005, Eur. J. Oper. Res..

[143]  Xin Yao,et al.  Automatic modularization by speciation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[144]  Martin V. Butz,et al.  Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems , 2005, IEEE Transactions on Evolutionary Computation.

[145]  Francisco B. Pereira,et al.  Understanding the role of learning in the evolution of busy beavers: a comparison between the baldwin effect and a Lamarckian strategy , 2001 .

[146]  Jürgen Schmidhuber,et al.  Evolino: Hybrid Neuroevolution/Optimal Linear Search for Sequence Learning , 2005, IJCAI.

[147]  L. Darrell Whitley,et al.  Adding Learning to the Cellular Development of Neural Networks: Evolution and the Baldwin Effect , 1993, Evolutionary Computation.

[148]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[149]  Risto Miikkulainen,et al.  Evolving neural networks for fractured domains , 2008, GECCO '08.

[150]  Risto Miikkulainen,et al.  Efficient Non-linear Control Through Neuroevolution , 2006, ECML.

[151]  Risto Miikkulainen,et al.  Evolving adaptive neural networks with and without adaptive synapses , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[152]  Mitchell A. Potter,et al.  EVOLVING NEURAL NETWORKS WITH COLLABORATIVE SPECIES , 2006 .

[153]  Andrea Bonarini,et al.  An Introduction to Learning Fuzzy Classifier Systems , 1999, Learning Classifier Systems.

[154]  Martin V. Butz,et al.  Rule-Based Evolutionary Online Learning Systems - A Principled Approach to LCS Analysis and Design , 2006, Studies in Fuzziness and Soft Computing.

[155]  Edwin D. de Jong,et al.  Coevolutionary Principles , 2012, Handbook of Natural Computing.

[156]  Christian Igel,et al.  Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search , 2009, ICML '09.

[157]  Marco Colombetti,et al.  Robot Shaping: An Experiment in Behavior Engineering , 1997 .

[158]  Steffen Priesterjahn,et al.  Real-time imitation-based adaptation of gaming behaviour in modern computer games , 2008, GECCO '08.

[159]  Stefano Nolfi,et al.  Evolutionary robotics , 1998, Lecture Notes in Computer Science.

[160]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[161]  Risto Miikkulainen,et al.  Efficient Reinforcement Learning through Symbiotic Evolution , 2004 .