论文信息 - Generalization in Reinforcement Learning

Generalization in Reinforcement Learning

In this paper we evaluate two Temporal Difference Reinforcement Learning methods on several different tasks to see how well these methods generalize. The tasks were modeled as Markov Decision Processes with a continuous observation space and a discrete action space. Function approximation was done using linear gradient descent with RBFs as features. The tasks were taken from the Polyathlon domain of the 2009 Reinforcement Learning Competition. It was found that the more sophisticated method generalized better, but both methods were sensitive to changes in parameters, so there is a lot of room for improvement.

Wouter Josemans

[1] R. Bellman. Dynamic programming. , 1957, Science.

[2] Peter Stone,et al. Model-based function approximation in reinforcement learning , 2007, AAMAS '07.

[3] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[4] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[5] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[6] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[7] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[8] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[9] Thomas Bartz-Beielstein,et al. Reinforcement learning for games: failures and successes , 2009, GECCO '09.

[10] Jonathan Schaeffer,et al. Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.

[11] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[12] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.