论文信息 - Basis Function Construction in Reinforcement Learning Using Cascade-Correlation Learning Architecture

Basis Function Construction in Reinforcement Learning Using Cascade-Correlation Learning Architecture

In reinforcement learning, it is a common practice to map the state(-action) space to a different one using basis functions. This transformation aims to represent the input data in a more informative form that facilitates and improves subsequent steps. As a "good'' set of basis functions result in better solutions and defining such functions becomes a challenge with increasing problem complexity, it is beneficial to be able to generate them automatically. In this paper, we propose a new approach based on Bellman residual for constructing basis functions using cascade-correlation learning architecture. We show how this approach can be applied to Least Squares Policy Iteration algorithm in order to obtain a better approximation of the value function, and consequently improve the performance of the resulting policies. We also present the effectiveness of the method empirically on some benchmark problems.

Philippe Preux | Sertan Girgin

[1] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[2] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[3] Doina Precup,et al. Combining TD-learning with Cascade-correlation Networks , 2003, ICML.

[4] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.

[5] Philippe Preux,et al. Basis Expansion in Natural Actor Critic Methods , 2008, EWRL.

[6] Shie Mannor,et al. Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8] Sridhar Mahadevan,et al. Constructing basis functions from directed graphs for value function approximation , 2007, ICML '07.

[9] Philippe Preux,et al. Feature Discovery in Reinforcement Learning Using Genetic Programming , 2008, EuroGP.

[10] M. Loth,et al. Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[11] Christian Lebiere,et al. The Cascade-Correlation Learning Architecture , 1989, NIPS.

[12] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[13] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15] Rémi Coulom,et al. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .