Worst-Case Regret Bounds for Exploration via Randomized Value Functions
暂无分享,去创建一个
[1] Christos Dimitrakakis,et al. Randomised Bayesian Least-Squares Policy Iteration , 2019, ArXiv.
[2] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[3] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[4] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[5] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.
[6] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[7] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[8] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[9] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[10] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[11] Joelle Pineau,et al. Randomized Value Functions via Multiplicative Normalizing Flows , 2018, UAI.
[12] Benjamin Van Roy,et al. Ensemble Sampling , 2017, NIPS.
[13] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[14] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[15] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[16] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[17] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.
[18] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[19] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[20] Alessandro Lazaric,et al. Linear Thompson Sampling Revisited , 2016, AISTATS.
[21] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[22] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[23] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[24] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[25] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[26] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[27] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[28] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[29] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.