Exploiting structure and uncertainty of Bellman updates in Markov decision processes
暂无分享,去创建一个
Andrea Bonarini | Marcello Restelli | Carlo D'Eramo | Alessandro Nuara | Davide Tateo | Andrea Bonarini | Marcello Restelli | Carlo D'Eramo | Alessandro Nuara | Davide Tateo
[1] Kunikazu Kobayashi,et al. A Meta-learning Method Based on Temporal Difference Error , 2009, ICONIP.
[2] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[3] Kenji Doya,et al. Meta-learning in Reinforcement Learning , 2003, Neural Networks.
[4] Damien Ernst,et al. How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies , 2015, ArXiv.
[5] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[6] Warren B. Powell,et al. Bias-corrected Q-learning to control max-operator bias in Q-learning , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[7] Robert L. Winkler,et al. The Optimizer's Curse: Skepticism and Postdecision Surprise in Decision Analysis , 2006, Manag. Sci..
[8] Bingkun Bao,et al. Infinite-Horizon Policy-Gradient Estimation with Variable Discount Factor for Markov Decision Process , 2008, 2008 3rd International Conference on Innovative Computing Information and Control.
[9] Naoto Yoshida,et al. Reinforcement learning with state-dependent discount factor , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).
[10] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[11] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[12] Hado van Hasselt,et al. Estimating the Maximum Expected Value: An Analysis of (Nested) Cross Validation and the Maximum Sample Average , 2013, ArXiv.
[13] E. Steen. Rational Overoptimism (and Other Biases) , 2004 .
[14] Ambuj Tewari,et al. Bounded Parameter Markov Decision Processes with Average Reward Criterion , 2007, COLT.
[15] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[16] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[17] Marcello Restelli,et al. Estimating Maximum Expected Value through Gaussian Approximation , 2016, ICML.