L1 Regularized Linear Temporal Difference Learning

Several recent efforts in the field of reinforcement learning have focused attention on the importance of regularization, but the techniques for incorporating regularization into reinforcement learning algorithms, and the effects of these changes upon the convergence of these algorithms, are ongoing areas of research. In particular, little has been written about the use of regularization in online reinforcement learning. In this paper, we describe a novel online stochastic approximation algorithm for reinforcement learning. We prove convergence of the online algorithm and show that the L1 regularized linear fixed point of LARS-TD and LC-TD is an equilibrium fixed point of the algorithm.

[1]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[2]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[5]  Sean P. Meyn,et al.  The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[6]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[7]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[8]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[9]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[10]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[13]  M. Loth,et al.  Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[14]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .

[15]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[16]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[17]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[18]  Marek Petrik,et al.  Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.

[19]  Ronald Parr,et al.  Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.

[20]  Matthew W. Hoffman,et al.  Finite-Sample Analysis of Lasso-TD , 2011, ICML.

[21]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .