Comparing evolutionary and temporal difference methods in a reinforcement learning domain

Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods' relative strengths and weaknesses. This paper presents the results of a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. In particular, we compare the performance of NEAT [19], a GA that evolves neural networks, with Sarsa [16, 17], a popular TD method. The results demonstrate that NEAT can learn better policies in this task, though it requires more evaluations to do so. Additional experiments in two variations of Keepaway demonstrate that Sarsa learns better policies when the task is fully observable and NEAT learns faster when the task is deterministic. Together, these results help isolate the factors critical to the performance of each method and yield insights into their general strengths and weaknesses.

[1]  David E. Goldberg,et al.  Genetic Algorithms with Sharing for Multimodalfunction Optimization , 1987, ICGA.

[2]  M. J. D. Powell,et al.  Radial basis functions for multivariable interpolation: a review , 1987 .

[3]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[4]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[6]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[7]  Larry D. Pyeatt,et al.  A comparison between cellular encoding and direct encoding for genetic neural networks , 1996 .

[8]  Jordan B. Pollack,et al.  Coevolution of a Backgammon Player , 1996 .

[9]  Ian Frank,et al.  Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[10]  Peter Stagge,et al.  Averaging Efficiently in the Presence of Noise , 1998, PPSN.

[11]  X. Yao Evolving Artificial Neural Networks , 1999 .

[12]  Kenneth A. De Jong,et al.  Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents , 2000, Evolutionary Computation.

[13]  Huosheng Hu,et al.  KaBaGe-RL: Kanerva-based generalisation and reinforcement learning for possession football , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[14]  Steven M. Gustafson,et al.  Genetic Programming And Multi-agent Layered Learning By Reinforcements , 2002, GECCO.

[15]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[16]  Sandor Markon,et al.  Threshold selection, hypothesis tests, and DOE methods , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[17]  Risto Miikkulainen,et al.  Evolving Keepaway Soccer Players through Task Decomposition , 2003, GECCO.

[18]  Risto Miikkulainen,et al.  Robust non-linear control through neuroevolution , 2003 .

[19]  Richard S. Sutton,et al.  Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[20]  Risto Miikkulainen,et al.  Competitive Coevolution through Evolutionary Complexification , 2011, J. Artif. Intell. Res..

[21]  Risto Miikkulainen,et al.  Efficient Reinforcement Learning through Symbiotic Evolution , 2004 .

[22]  Risto Miikkulainen,et al.  Evolving a Roving Eye for Go , 2004, GECCO.

[23]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[24]  Risto Miikkulainen,et al.  Evolving Soccer Keepaway Players Through Task Decomposition , 2005, Machine Learning.

[25]  Peter Stone,et al.  Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[26]  Nicholas J. Radcliffe,et al.  Genetic set recombination and its application to neural network topology optimisation , 1993, Neural Computing & Applications.

[27]  Jürgen Schmidhuber,et al.  Co-evolving recurrent neurons learn deep memory POMDPs , 2005, GECCO '05.

[28]  Shimon Whiteson,et al.  Evolutionary Function Approximation for Reinforcement Learning , 2006, J. Mach. Learn. Res..