论文信息 - State Aggregation for Reinforcement Learning using Neuroevolution

State Aggregation for Reinforcement Learning using Neuroevolution

In this paper, we present a new machine learning algorithm, RL-SANE, which uses a combination of neuroevolution (NE) and traditional reinforcement learning (RL) techniques to improve learning performace. RL-SANE is an innovative combination of the neuroevolutionary algorithm NEAT(Stanley, 2004) and the RL algorithm Sarsa(λ)(Sutton and Barto, 1998). It uses the special ability of NEAT to generate and train customized neural networks that provide a means for reducing the size of the state space through state aggregation. Reducing the size of the state space through aggregation enables Sarsa(λ) to be applied to much more difficult problems than standard tabular based approaches. Previous similar work in this area, such as in Whiteson and Stone (Whiteson and Stone, 2006) and Stanley and Miikkulainen (Stanley and Miikkulainen, 2001), have shown positive and promising results. This paper gives a brief overview of neuroevolutionary methods, introduces the RL-SANE algorithm, presents a comparative analysis of RL-SANE to other neuroevolutionary algorithms, and concludes with a discussion of enhancements that need to be made to RL-SANE.

Robert Wright | Nathaniel Gemelli

[1] Alex M. Andrew,et al. ROBOT LEARNING, edited by Jonathan H. Connell and Sridhar Mahadevan, Kluwer, Boston, 1993/1997, xii+240 pp., ISBN 0-7923-9365-1 (Hardback, 218.00 Guilders, $120.00, £89.95). , 1999, Robotica (Cambridge. Print).

[2] Risto Miikkulainen,et al. Solving Non-Markovian Control Tasks with Neuro-Evolution , 1999, IJCAI.

[3] Risto Miikkulainen,et al. Efficient evolution of neural networks through complexification , 2004 .

[4] Gerald Sommer,et al. Efficient Learning of Neural Networks with Evolutionary Algorithms , 2007, DAGM-Symposium.

[5] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[6] Risto Miikkulainen,et al. Forming Neural Networks Through Efficient and Adaptive Coevolution , 1997, Evolutionary Computation.

[7] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[8] Pere Ridao,et al. Learning reactive robot behaviors with Neural-Q_leaning , 2002 .

[9] Derek James,et al. A Comparative Analysis of Simplification and Complexification in the Evolution of Neural Network Topologies , 2004 .

[10] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[11] Risto Miikkulainen,et al. Efficient Reinforcement Learning Through Evolving Neural Network Topologies , 2002, GECCO.

[12] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[13] L. Buşoniu. Evolutionary function approximation for reinforcement learning , 2006 .

[14] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[15] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.