Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs
暂无分享,去创建一个
[1] Alessandro Lazaric,et al. Regret Minimization in MDPs with Options without Prior Knowledge , 2017, NIPS.
[2] Ronald Ortner,et al. Regret Bounds for Reinforcement Learning via Markov Chain Concentration , 2018, J. Artif. Intell. Res..
[3] Yishay Mansour,et al. Convergence of Optimistic and Incremental Q-Learning , 2001, NIPS.
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[6] Ronald Ortner,et al. Optimism in the Face of Uncertainty Should be Refutable , 2008, Minds and Machines.
[7] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[8] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[9] K. I. M. McKinnon,et al. On the Generation of Markov Decision Processes , 1995 .
[10] Achim Klenke,et al. Probability theory - a comprehensive course , 2008, Universitext.
[11] Ronald Ortner,et al. Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2015, ICML.
[12] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.
[13] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[14] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[15] Ronald Ortner,et al. Online Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2012, NIPS.
[16] Claudio Gentile,et al. Improved Risk Tail Bounds for On-Line Algorithms , 2005, IEEE Transactions on Information Theory.
[17] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[18] D. Freedman. On Tail Probabilities for Martingales , 1975 .
[19] Marcus Hutter,et al. Count-Based Exploration in Feature Space for Reinforcement Learning , 2017, IJCAI.
[20] Michael T. Rosenstein,et al. Supervised Actor‐Critic Reinforcement Learning , 2012 .
[21] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[22] H. Teicher,et al. Probability theory: Independence, interchangeability, martingales , 1978 .
[23] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[24] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[25] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[26] Sham M. Kakade,et al. Variance Reduction Methods for Sublinear Reinforcement Learning , 2018, ArXiv.
[27] Alessandro Lazaric,et al. Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes , 2018, NeurIPS.
[28] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[29] Mohammad Sadegh Talebi,et al. Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs , 2018, ALT.
[30] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[31] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.