Off-policy Learning with Options and Recognizers
暂无分享,去创建一个
Doina Precup | Richard S. Sutton | Satinder P. Singh | Cosmin Paduraru | Anna Koop | R. Sutton | Satinder Singh | Doina Precup | Cosmin Paduraru | Anna Koop
[1] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[2] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[3] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[4] Michael I. Jordan,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001 .
[5] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[6] Vladislav Tadic,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001, Machine Learning.
[7] Richard S. Sutton,et al. Temporal-Difference Networks , 2004, NIPS.
[8] Richard S. Sutton,et al. Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.