Directly Estimating the Variance of the {\lambda}-Return Using Temporal-Difference Methods
暂无分享,去创建一个
Richard S. Sutton | Adam White | Craig Sherstan | Martha White | Kenny Young | Brendan Bennett | Dylan R. Ashley
[1] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.
[2] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.
[3] Mohammad Ghavamzadeh,et al. Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.
[4] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[5] Shie Mannor,et al. Learning the Variance of the Reward-To-Go , 2016, J. Mach. Learn. Res..
[6] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[7] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[8] Mohammad Ghavamzadeh,et al. Variance-constrained actor-critic algorithms for discounted and average reward MDPs , 2014, Machine Learning.
[9] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[10] Martha White,et al. A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning , 2016, AAMAS.
[11] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[12] Makoto Sato,et al. TD algorithm for the variance of return and mean-variance reinforcement learning , 2001 .
[13] Shie Mannor,et al. Variance Adjusted Actor Critic Algorithms , 2013, ArXiv.
[14] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[15] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[16] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.