论文信息 - ℓ1 Regularized Gradient Temporal-Difference Learning

ℓ1 Regularized Gradient Temporal-Difference Learning

The family of Gradient Temporal-Difference (GTD) learning algorithms shares a promising property of being stable with both linear function approximation and off-policy training. The success of the GTD family requires a suitable set of features, which are unfortunately not always available in reality. To overcome this difficulty, regularization is often employed as an effective method for feature selection in reinforcement learning. In the present work, we propose and investigate a family of `1 regularized GTD learning algorithms.

Hao Shen | Klaus Diepold | Dominik Meyer

[1] Ronald E. Parr,et al. L1 Regularized Linear Temporal Difference Learning , 2012 .

[2] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.

[3] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[4] Michael Elad,et al. L1-L2 Optimization in Signal and Image Processing , 2010, IEEE Signal Processing Magazine.