Rethinking Exploration for Sample-Efficient Policy Learning
暂无分享,去创建一个
Martin A. Riedmiller | Jost Tobias Springenberg | Martin Riedmiller | William F. Whitney | Michael Bloesch | Abbas Abdolmaleki | A. Abdolmaleki | Michael Bloesch | J. T. Springenberg
[1] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[2] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[3] A. Gupta,et al. See, Hear, Explore: Curiosity via Audio-Visual Association , 2020, NeurIPS.
[4] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[5] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[6] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[7] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[8] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[9] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[10] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[11] Abhinav Gupta,et al. Dynamics-aware Embeddings , 2019, ICLR.
[12] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[13] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[14] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[15] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[16] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[17] Marlos C. Machado,et al. On Bonus Based Exploration Methods In The Arcade Learning Environment , 2020, ICLR.
[18] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[19] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[20] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[21] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[23] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[24] Yuval Tassa,et al. Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.
[25] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[26] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[27] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[28] Marlos C. Machado,et al. Count-Based Exploration with the Successor Representation , 2018, AAAI.
[29] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[30] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[31] Daniel Guo,et al. Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.
[32] Shimon Whiteson,et al. Optimistic Exploration even with a Pessimistic Initialisation , 2020, ICLR.
[33] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[34] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[35] Martin A. Riedmiller,et al. Reinforcement learning on explicitly specified time scales , 2003, Neural Computing & Applications.
[36] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[37] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[38] Georg Ostrovski,et al. Temporally-Extended ε-Greedy Exploration , 2020, ICLR.