Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
暂无分享,去创建一个
[1] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[3] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[4] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[5] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[6] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[7] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[8] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[9] Zheng Wen,et al. Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.
[10] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[11] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[12] Shie Mannor,et al. How hard is my MDP?" The distribution-norm to the rescue" , 2014, NIPS.
[13] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[14] Tor Lattimore,et al. Near-optimal PAC bounds for discounted MDPs , 2014, Theor. Comput. Sci..
[15] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[16] Benjamin Van Roy,et al. On Lower Bounds for Regret in Reinforcement Learning , 2016, ArXiv.
[17] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[18] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[19] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[20] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.
[21] Nan Jiang,et al. Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon , 2018, COLT.
[22] Emma Brunskill,et al. Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs , 2018, ICML.
[23] Mohammad Sadegh Talebi,et al. Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs , 2018, ALT.
[24] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[25] Sham M. Kakade,et al. Variance Reduction Methods for Sublinear Reinforcement Learning , 2018, ArXiv.
[26] Lihong Li,et al. Policy Certificates: Towards Accountable Reinforcement Learning , 2018, ICML.