论文信息 - Evolutionary Computation for Reinforcement Learning

Evolutionary Computation for Reinforcement Learning

Algorithms for evolutionary computation, which simulate the process of natural selection to solve optimization problems, are an effective tool for discovering high-performing reinforcement-learning policies. Because they can automatically find good representations, handle continuous action spaces, and cope with partial observability, evolutionary reinforcement-learning approaches have a strong empirical track record, sometimes significantly outperforming temporal-difference methods. This chapter surveys research on the application of evolutionary computation to reinforcement learning, overviewing methods for evolving neural-network topologies and weights, hybrid methods that also use temporal-difference methods, coevolutionary methods for multi-agent settings, generative and developmental systems, and methods for on-line evolutionary reinforcement learning.

Shimon Whiteson | Shimon Whiteson

[1] Alexander Zelinsky,et al. Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[2] Stewart W. Wilson. Function approximation with a classifier system , 2001 .

[3] Peter Nordin,et al. An On-Line Method to Evolve Behavior and to Control a Miniature Robot in Real Time with Genetic Programming , 1996, Adapt. Behav..

[4] Richard K. Belew,et al. New Methods for Competitive Coevolution , 1997, Evolutionary Computation.

[5] Shimon Whiteson,et al. Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning , 2010, Autonomous Agents and Multi-Agent Systems.

[6] Samir W. Mahfoud. A Comparison of Parallel and Sequential Niching Methods , 1995, ICGA.

[7] MSc PhD Tim Kovacs BA. Strength or Accuracy: Credit Assignment in Learning Classifier Systems , 2004, Distinguished Dissertations.

[8] Ida G. Sprinkhuizen-Kuyper,et al. Evolving Artificial Neural Networks using the "Baldwin Effect" † , 1995 .

[9] Ida G. Sprinkhuizen-Kuyper,et al. Evolving Neural Networks Using the "Baldwin Effect" , 1995, ICANNGA.

[10] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation , 1998 .

[11] L. Darrell Whitley,et al. Genetic Reinforcement Learning for Neurocontrol Problems , 2004, Machine Learning.

[12] Diego Calvanese,et al. Unifying Class-Based Representation Formalisms , 2011, J. Artif. Intell. Res..

[13] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[14] A. P. Wieland,et al. Evolving neural network controllers for unstable systems , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[15] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[16] J. Krebs,et al. Arms races between and within species , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[17] D. E. Goldberg,et al. Genetic Algorithms in Search , 1989 .

[18] Risto Miikkulainen,et al. Competitive Coevolution through Evolutionary Complexification , 2011, J. Artif. Intell. Res..

[19] R. French,et al. Genes, Phenes and the Baldwin Effect: Learning and Evolution in a Simulated Population , 1994 .

[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21] Risto Miikkulainen,et al. Culling and Teaching in Neuro-Evolution , 1997, ICGA.

[22] Kenneth O. Stanley,et al. A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[23] David E. Goldberg,et al. Genetic Algorithms with Sharing for Multimodalfunction Optimization , 1987, ICGA.

[24] D. R. McGregor,et al. Designing application-specific neural networks using the structured genetic algorithm , 1992, [Proceedings] COGANN-92: International Workshop on Combinations of Genetic Algorithms and Neural Networks.

[25] Asunción Gómez-Pérez,et al. The Semantic Web: Research and Applications, Second European Semantic Web Conference, ESWC 2005, Heraklion, Crete, Greece, May 29 - June 1, 2005, Proceedings , 2005, ESWC.

[26] Jürgen Schmidhuber,et al. Evolving Modular Fast-Weight Networks for Control , 2005, ICANN.

[27] Shimon Whiteson,et al. Comparing evolutionary and temporal difference methods in a reinforcement learning domain , 2006, GECCO.

[28] Kalyanmoy Deb,et al. A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[29] Geoffrey E. Hinton,et al. How Learning Can Guide Evolution , 1996, Complex Syst..

[30] Peter M. Todd,et al. Parental Guidance Suggested: How Parental Imprinting Evolves Through Sexual Selection as an Adaptive Learning Mechanism , 1993, Adapt. Behav..

[31] Joel Lehman,et al. Evolving policy geometry for scalable multiagent learning , 2010, AAMAS.

[32] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[33] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[34] J. A. Lozano,et al. Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[35] R. K. Ursem. Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[36] Christophe G. Giraud-Carrier,et al. Unifying Learning with Evolution Through Baldwinian Evolution and Lamarckism , 2000, Advances in Computational Intelligence and Learning.

[37] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[38] Takaya Arita,et al. Interactions between learning and evolution: the outstanding strategy generated by the Baldwin effect. , 2004, Bio Systems.

[39] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..

[40] Risto Miikkulainen,et al. Coevolution of Role-Based Cooperation in Multiagent Systems , 2009, IEEE Transactions on Autonomous Mental Development.

[41] Risto Miikkulainen,et al. Evolving Soccer Keepaway Players Through Task Decomposition , 2005, Machine Learning.

[42] Francesco Mondada,et al. Evolution of homing navigation in a real mobile robot , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[43] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[44] Stewart W. Wilson. Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[45] Gerald Tesauro. Comments on “Co-Evolution in the Successful Learning of Backgammon Strategy” , 2004, Machine Learning.

[46] Risto Miikkulainen,et al. Forming Neural Networks Through Efficient and Adaptive Coevolution , 1997, Evolutionary Computation.

[47] Edwin D. de Jong,et al. The Incremental Pareto-Coevolution Archive , 2004, GECCO.

[48] Thomas Jansen,et al. The Cooperative Coevolutionary (11) EA , 2004, Evolutionary Computation.

[49] Keith L. Downing,et al. Reinforced Genetic Programming , 2001, Genetic Programming and Evolvable Machines.

[50] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[51] Luc Steels,et al. Emergent functionality in robotic agents through on-line evolution , 1994 .

[52] Dirk P. Kroese,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[53] Grzegorz Rozenberg,et al. Handbook of Natural Computing , 2011, Springer Berlin Heidelberg.

[54] Verena Heidrich-Meisner,et al. Neuroevolution strategies for episodic reinforcement learning , 2009, J. Algorithms.

[55] Olivier Sigaud,et al. YACS: a new learning classifier system using anticipation , 2002, Soft Comput..

[56] A. Lindenmayer. Mathematical models for cellular interactions in development. II. Simple and branching filaments with two-sided inputs. , 1968, Journal of theoretical biology.

[57] Risto Miikkulainen,et al. Evolving a Roving Eye for Go , 2004, GECCO.

[58] Moshe Sipper,et al. Evolving artificial neural networks with FINCH , 2013, GECCO '13 Companion.

[59] Martin V. Butz,et al. Sequential problems that test generalization in learning classifier systems , 2009, Evol. Intell..

[60] Simon M. Lucas,et al. Temporal Difference Learning Versus Co-Evolution for Acquiring Othello Position Evaluation , 2006, 2006 IEEE Symposium on Computational Intelligence and Games.

[61] Christian Igel,et al. Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem , 2008, EWRL.

[62] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[63] W. Daniel Hillis,et al. Co-evolving parasites improve simulated evolution as an optimization procedure , 1990 .

[64] Kagan Tumer,et al. Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[65] Dario Floreano,et al. Evolution of Plastic Control Networks , 2001, Auton. Robots.

[66] Christian Igel,et al. Uncertainty handling CMA-ES for reinforcement learning , 2009, GECCO.

[67] Byoung-Tak Zhang,et al. Evolving Optimal Neural Networks Using Genetic Algorithms with Occam's Razor , 1993, Complex Syst..

[68] Jürgen Schmidhuber,et al. Co-evolving recurrent neurons learn deep memory POMDPs , 2005, GECCO '05.

[69] Daniele Loiacono,et al. On-line neuroevolution applied to The Open Racing Car Simulator , 2009, 2009 IEEE Congress on Evolutionary Computation.

[70] Christian Igel,et al. Neuroevolution for reinforcement learning using evolution strategies , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[71] Jordan B. Pollack,et al. Pareto Optimality in Coevolutionary Learning , 2001, ECAL.

[72] Jordan B. Pollack,et al. A Game-Theoretic Approach to the Simple Coevolutionary Algorithm , 2000, PPSN.

[73] Risto Miikkulainen,et al. Accelerated Neural Evolution through Cooperatively Coevolved Synapses , 2008, J. Mach. Learn. Res..

[74] Peter J. Fleming,et al. On-line evolution of robust control systems: an industrial active magnetic bearing application , 2001 .

[75] David B. Fogel,et al. Evolving an expert checkers playing program without using human expertise , 2001, IEEE Trans. Evol. Comput..

[76] Sean Luke,et al. Archive-based cooperative coevolutionary algorithms , 2006, GECCO '06.

[77] Xin Yao,et al. Evolving artificial neural networks , 1999, Proc. IEEE.

[78] Kenneth A. De Jong,et al. Using genetic algorithms for concept learning , 1993, Machine Learning.

[79] Daniele Loiacono,et al. Learning to Drive in the Open Racing Car Simulator Using Online Neuroevolution , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[80] Arthur Tay,et al. Online adaptive controller for simulated car racing , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[81] Kagan Tumer,et al. Efficient Evaluation Functions for Evolving Coordination , 2008, Evolutionary Computation.

[82] David H. Ackley,et al. Interactions between learning and evolution , 1991 .

[83] Risto Miikkulainen,et al. Evolving Keepaway Soccer Players through Task Decomposition , 2003, GECCO.

[84] Riccardo Poli,et al. Genetic and Evolutionary Computation – GECCO 2004 , 2004, Lecture Notes in Computer Science.

[85] Erkki Oja,et al. Artificial Neural Networks: Formal Models and Their Applications - ICANN 2005, 15th International Conference, Warsaw, Poland, September 11-15, 2005, Proceedings, Part II , 2005, International Conference on Artificial Neural Networks.

[86] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[87] Stefano Nolfi,et al. Learning to Adapt to Changing Environments in Evolving Neural Networks , 1996, Adapt. Behav..

[88] Jordan B. Pollack,et al. Creating High-Level Components with a Generative Representation for Body-Brain Evolution , 2002, Artificial Life.

[89] Simon M. Lucas,et al. Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go , 2005, IEEE Transactions on Evolutionary Computation.

[90] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .

[91] Aristid Lindenmayer,et al. Mathematical Models for Cellular Interactions in Development , 1968 .

[92] J. Baldwin. A New Factor in Evolution , 1896, The American Naturalist.

[93] Dario Floreano,et al. Evolving Vision-Based Flying Robots , 2002, Biologically Motivated Computer Vision.

[94] John H. Holland,et al. Cognitive systems based on adaptive algorithms , 1977, SGAR.

[95] Risto Miikkulainen,et al. Active Guidance for a Finless Rocket Using Neuroevolution , 2003, GECCO.

[96] Kenneth O. Stanley,et al. Autonomous Evolution of Topographic Regularities in Artificial Neural Networks , 2010, Neural Computation.

[97] Goldberg,et al. Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[98] Martin V. Butz,et al. Anticipatory Learning Classifier Systems and Factored Reinforcement Learning , 2009, ABiALS.

[99] Kenneth A. De Jong,et al. An Analysis of the Interacting Roles of Population Size and Crossover in Genetic Algorithms , 1990, PPSN.

[100] L. Darrell Whitley,et al. Lamarckian Evolution, The Baldwin Effect and Function Optimization , 1994, PPSN.

[101] Inman Harvey,et al. Evolutionary Robotics: A Survey of Applications and Problems , 1998, EvoRobot.

[102] Risto Miikkulainen,et al. Evolving neural networks for strategic decision-making problems , 2009, Neural Networks.

[103] Seong-Whan Lee,et al. Biologically Motivated Computer Vision , 2002, Lecture Notes in Computer Science.

[104] R. Paul Wiegand,et al. An empirical analysis of collaboration methods in cooperative coevolutionary algorithms , 2001 .

[105] Shimon Whiteson,et al. On-line evolutionary computation for reinforcement learning in stochastic domains , 2006, GECCO.

[106] Neil D. Lawrence,et al. Missing Data in Kernel PCA , 2006, ECML.

[107] Risto Miikkulainen,et al. Coevolution of neural networks using a layered pareto archive , 2006, GECCO.

[108] José del R. Millán,et al. Continuous-Action Q-Learning , 2002, Machine Learning.

[109] Martin V. Butz,et al. Learning sensorimotor control structures with XCSF: redundancy exploitation and dynamic control , 2009, GECCO '09.

[110] Frédéric Gruau,et al. Automatic Definition of Modular Neural Networks , 1994, Adapt. Behav..

[111] Petros Koumoutsakos,et al. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[112] L. Buşoniu. Evolutionary function approximation for reinforcement learning , 2006 .

[113] Jürgen Schmidhuber,et al. Training Recurrent Networks by Evolino , 2007, Neural Computation.

[114] Kenneth O. Stanley,et al. Evolving Static Representations for Task Transfer , 2010, J. Mach. Learn. Res..

[115] Nicholas J. Radcliffe,et al. Genetic set recombination and its application to neural network topology optimisation , 1993, Neural Computing & Applications.

[116] Serge Kernbach,et al. Evolutionary robotics: The next-generation-platform for on-line and on-board artificial evolution , 2009, 2009 IEEE Congress on Evolutionary Computation.

[117] Tim Kovacs,et al. Foundations of learning classifier systems: An introduction , 2005 .

[118] Zixing Cai,et al. Cooperative Coevolutionary Adaptive Genetic Algorithm in Path Planning of Cooperative Multi-Mobile Robot Systems , 2002, J. Intell. Robotic Syst..

[119] Jordan B. Pollack,et al. Co-Evolution in the Successful Learning of Backgammon Strategy , 1998, Machine Learning.

[120] Jeffrey L. Elman,et al. Learning and Evolution in Neural Networks , 1994, Adapt. Behav..

[121] Kenneth O. Stanley. A Hypercube-Based Indirect Encoding for Evolving Large-Scale Neural Networks , 2009 .

[122] Kenneth O. Stanley,et al. A Case Study on the Critical Role of Geometric Regularity in Machine Learning , 2008, AAAI.

[123] Martin V. Butz,et al. Function Approximation With XCS: Hyperellipsoidal Conditions, Recursive Least Squares, and Compaction , 2008, IEEE Transactions on Evolutionary Computation.

[124] Kenneth A. De Jong,et al. Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents , 2000, Evolutionary Computation.

[125] Gilbert Syswerda,et al. Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[126] Risto Miikkulainen,et al. Solving Non-Markovian Control Tasks with Neuro-Evolution , 1999, IJCAI.

[127] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[128] Shimon Whiteson,et al. Neuroevolutionary reinforcement learning for generalized helicopter control , 2009, GECCO.

[129] Risto Miikkulainen,et al. A Taxonomy for Artificial Embryogeny , 2003, Artificial Life.

[130] Martin V. Butz,et al. Anticipatory Learning Classifier Systems , 2002, Genetic Algorithms and Evolutionary Computation.

[131] Dilip Kumar Pratihar,et al. Evolutionary robotics—A review , 2003 .

[132] Edwin D de Jong. A monotonic archive for pareto-coevolution. , 2007, Evolutionary computation.

[133] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[134] Jan Drugowitsch. Design and Analysis of Learning Classifier Systems - A Probabilistic Approach , 2008, Studies in Computational Intelligence.

[135] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.

[136] Larry Bull,et al. Accuracy-based Neuro And Neuro-fuzzy Classifier Systems , 2002, GECCO.

[137] Pedro Larrañaga,et al. Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[138] Julian Togelius,et al. Point-to-Point Car Racing: an Initial Study of Evolution Versus Temporal Difference Learning , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[139] Gary B. Lamont,et al. Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[140] Martin V. Butz,et al. Context-dependent predictions and cognitive arm control with XCSF , 2008, GECCO '08.

[141] Shimon Whiteson,et al. The Reinforcement Learning Competitions , 2010 .

[142] Olivier Sigaud,et al. Combining latent learning with dynamic programming in the modular anticipatory classifier system , 2005, Eur. J. Oper. Res..

[143] Xin Yao,et al. Automatic modularization by speciation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[144] Martin V. Butz,et al. Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems , 2005, IEEE Transactions on Evolutionary Computation.

[145] Francisco B. Pereira,et al. Understanding the role of learning in the evolution of busy beavers: a comparison between the baldwin effect and a Lamarckian strategy , 2001 .

[146] Jürgen Schmidhuber,et al. Evolino: Hybrid Neuroevolution/Optimal Linear Search for Sequence Learning , 2005, IJCAI.

[147] L. Darrell Whitley,et al. Adding Learning to the Cellular Development of Neural Networks: Evolution and the Baldwin Effect , 1993, Evolutionary Computation.

[148] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[149] Risto Miikkulainen,et al. Evolving neural networks for fractured domains , 2008, GECCO '08.

[150] Risto Miikkulainen,et al. Efficient Non-linear Control Through Neuroevolution , 2006, ECML.

[151] Risto Miikkulainen,et al. Evolving adaptive neural networks with and without adaptive synapses , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[152] Mitchell A. Potter,et al. EVOLVING NEURAL NETWORKS WITH COLLABORATIVE SPECIES , 2006 .

[153] Andrea Bonarini,et al. An Introduction to Learning Fuzzy Classifier Systems , 1999, Learning Classifier Systems.

[154] Martin V. Butz,et al. Rule-Based Evolutionary Online Learning Systems - A Principled Approach to LCS Analysis and Design , 2006, Studies in Fuzziness and Soft Computing.

[155] Edwin D. de Jong,et al. Coevolutionary Principles , 2012, Handbook of Natural Computing.

[156] Christian Igel,et al. Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search , 2009, ICML '09.

[157] Marco Colombetti,et al. Robot Shaping: An Experiment in Behavior Engineering , 1997 .

[158] Steffen Priesterjahn,et al. Real-time imitation-based adaptation of gaming behaviour in modern computer games , 2008, GECCO '08.

[159] Stefano Nolfi,et al. Evolutionary robotics , 1998, Lecture Notes in Computer Science.

[160] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[161] Risto Miikkulainen,et al. Efficient Reinforcement Learning through Symbiotic Evolution , 2004 .