Tactical Optimism and Pessimism for Deep Reinforcement Learning
暂无分享,去创建一个
[1] Csaba Szepesvári,et al. Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.
[2] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[3] Carlos Riquelme,et al. Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates , 2019, NeurIPS.
[4] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[5] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[6] Krzysztof Choromanski,et al. Effective Diversity in Population-Based Reinforcement Learning , 2020, NeurIPS.
[7] Pieter Abbeel,et al. CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.
[8] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .
[9] Sarah Filippi,et al. Optimism in reinforcement learning and Kullback-Leibler divergence , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[10] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[11] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[12] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[13] Krzysztof Choromanski,et al. On Optimism in Model-Based Reinforcement Learning , 2020, ArXiv.
[14] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[15] Arthur Gretton,et al. Kernelized Wasserstein Natural Gradient , 2020, ICLR.
[16] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[17] Arthur Gretton,et al. Efficient Wasserstein Natural Gradients for Reinforcement Learning , 2020, ICLR.
[18] Julian Zimmert,et al. Model Selection in Contextual Stochastic Bandit Problems , 2020, NeurIPS.
[19] Bo Liu,et al. QUOTA: The Quantile Option Architecture for Reinforcement Learning , 2018, AAAI.
[20] Mohammad Norouzi,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.
[21] Robert Loftin,et al. Better Exploration with Optimistic Actor-Critic , 2019, NeurIPS.
[22] Christos Dimitrakakis,et al. Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities , 2019, ArXiv.
[23] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[24] Honglak Lee,et al. Predictive Information Accelerates Learning in RL , 2020, NeurIPS.
[25] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[26] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[27] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[28] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[29] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[30] Quoc V. Le,et al. Evolving Reinforcement Learning Algorithms , 2021, ArXiv.
[31] Daniel Guo,et al. Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.
[32] Philip J. Ball,et al. OffCon3: What is state of the art anyway? , 2021, ArXiv.
[33] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[34] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[35] Krzysztof Choromanski,et al. Ready Policy One: World Building Through Active Learning , 2020, ICML.
[36] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[37] Michael I. Jordan,et al. Learning to Score Behaviors for Guided Policy Optimization , 2020, ICML.
[38] Haipeng Luo,et al. Corralling a Band of Bandit Algorithms , 2016, COLT.
[39] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[40] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[41] Ilya Kostrikov,et al. Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ArXiv.
[42] Junhyuk Oh,et al. Discovering Reinforcement Learning Algorithms , 2020, NeurIPS.
[43] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[44] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[45] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[46] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[47] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[48] Tom Schaul,et al. Adapting Behaviour for Learning Progress , 2019, ArXiv.
[49] Jia Yuan Yu,et al. A Scheme for Dynamic Risk-Sensitive Sequential Decision Making , 2019, ArXiv.
[50] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[51] Pieter Abbeel,et al. Reinforcement Learning with Augmented Data , 2020, NeurIPS.
[52] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[53] Louis Kirsch,et al. Improving Generalization in Meta Reinforcement Learning using Learned Objectives , 2020, ICLR.
[54] Marc G. Bellemare,et al. Statistics and Samples in Distributional Reinforcement Learning , 2019, ICML.
[55] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[56] Claudio Gentile,et al. Regret Bound Balancing and Elimination for Model Selection in Bandits and RL , 2020, ArXiv.
[57] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.