Generalization in Reinforcement Learning

In this paper we evaluate two Temporal Difference Reinforcement Learning methods on several different tasks to see how well these methods generalize. The tasks were modeled as Markov Decision Processes with a continuous observation space and a discrete action space. Function approximation was done using linear gradient descent with RBFs as features. The tasks were taken from the Polyathlon domain of the 2009 Reinforcement Learning Competition. It was found that the more sophisticated method generalized better, but both methods were sensitive to changes in parameters, so there is a lot of room for improvement.