暂无分享,去创建一个
The family of Gradient Temporal-Difference (GTD) learning algorithms shares a promising property of being stable with both linear function approximation and off-policy training. The success of the GTD family requires a suitable set of features, which are unfortunately not always available in reality. To overcome this difficulty, regularization is often employed as an effective method for feature selection in reinforcement learning. In the present work, we propose and investigate a family of `1 regularized GTD learning algorithms.
[1] Ronald E. Parr,et al. L1 Regularized Linear Temporal Difference Learning , 2012 .
[2] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[3] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[4] Michael Elad,et al. L1-L2 Optimization in Signal and Image Processing , 2010, IEEE Signal Processing Magazine.