Regularization and feature selection in least-squares temporal difference learning

We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the Least-Squares Temporal Difference (LSTD) algorithm, provide a method for learning the parameters of the value function, but when the number of features is large this algorithm can over-fit to the data and is computationally expensive. In this paper, we propose a regularization framework for the LSTD algorithm that overcomes these difficulties. In particular, we focus on the case of l1 regularization, which is robust to irrelevant features and also serves as a method for feature selection. Although the l1 regularized LSTD solution cannot be expressed as a convex optimization problem, we present an algorithm similar to the Least Angle Regression (LARS) algorithm that can efficiently compute the optimal solution. Finally, we demonstrate the performance of the algorithm experimentally.

[1]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[2]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  Adrian Corduneanu,et al.  On Information Regularization , 2002, UAI.

[7]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[8]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[9]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[10]  Justin A. Boyan,et al.  Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[11]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[12]  Shie Mannor,et al.  Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[15]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[16]  Fuzhen Zhang The Schur complement and its applications , 2005 .

[17]  Shie Mannor,et al.  Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[18]  Alborz Geramifard,et al.  Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.

[19]  Daniel Polani,et al.  Least Squares SVM for Least Squares TD Learning , 2006, ECAI.

[20]  Xin Xu,et al.  Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[21]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[22]  M. Loth,et al.  Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[23]  Lihong Li,et al.  Analyzing feature generation for value-function approximation , 2007, ICML '07.

[24]  Shie Mannor,et al.  Regularized Policy Iteration , 2008, NIPS.