Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation
暂无分享,去创建一个
[1] M. Mohri,et al. A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning , 2022, NeurIPS.
[2] Doina Precup,et al. On the Expressivity of Markov Reward , 2021, NeurIPS.
[3] Tong Zhang,et al. Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning , 2021, SIAM Journal on Mathematics of Data Science.
[4] Kevin G. Jamieson,et al. Beyond No Regret: Instance-Dependent PAC Reinforcement Learning , 2021, COLT.
[5] Masatoshi Uehara,et al. Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage , 2021, ICLR.
[6] Julian Zimmert,et al. Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning , 2021, NeurIPS.
[7] Shachar Lovett,et al. Bilinear Classes: A Structural Framework for Provable Generalization in RL , 2021, ICML.
[8] Yujing Hu,et al. Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping , 2020, NeurIPS.
[9] David Simchi-Levi,et al. Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective , 2020, COLT.
[10] Sheila A. McIlraith,et al. Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning , 2020, J. Artif. Intell. Res..
[11] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[12] Georg Ostrovski,et al. Temporally-Extended {\epsilon}-Greedy Exploration , 2020, 2006.01782.
[13] Lin F. Yang,et al. Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension , 2020, NeurIPS.
[14] Dylan J. Foster,et al. Naive Exploration is Optimal for Online LQR , 2020, ICML.
[15] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[16] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[17] Benjamin Recht,et al. Certainty Equivalence is Efficient for Linear Quadratic Control , 2019, NeurIPS.
[18] J. Langford,et al. Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches , 2018, COLT.
[19] Jon D. McAuliffe,et al. Time-uniform, nonparametric, nonasymptotic confidence sequences , 2018, The Annals of Statistics.
[20] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[21] Yao Liu,et al. When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms , 2018, ArXiv.
[22] Nan Jiang,et al. On Oracle-Efficient PAC RL with Rich Observations , 2018, NeurIPS.
[23] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[24] Marek Grzes,et al. Reward Shaping in Episodic Reinforcement Learning , 2017, AAMAS.
[25] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[26] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[27] Nan Jiang,et al. On Structural Properties of MDPs that Bound Loss Due to Shallow Planning , 2016, IJCAI.
[28] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[29] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[30] Marc G. Bellemare,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[31] Karthik Sridharan,et al. Online Nonparametric Regression with General Loss Functions , 2015, ArXiv.
[32] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[33] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[34] Bhaskara Marthi,et al. Automatic shaping and decomposition of reward functions , 2007, ICML '07.
[35] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[36] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[37] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[38] Gerald DeJong,et al. The Influence of Reward on the Speed of Reinforcement Learning: An Analysis of Shaping , 2003, ICML.
[39] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[40] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[41] A. Singla,et al. Explicable Reward Design for Reinforcement Learning Agents , 2021, NeurIPS.
[42] N. Sanghi. Model-Free Approaches , 2021 .