Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning
暂无分享,去创建一个
Bruno Scherrer | Matthieu Geist | Olivier Pietquin | Tadashi Kozuno | Rémi Munos | Nino Vieillard | R. Munos | B. Scherrer | O. Pietquin | M. Geist | Nino Vieillard | Tadashi Kozuno
[1] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.
[2] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[3] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[4] Kenji Doya,et al. Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning , 2019, AISTATS.
[5] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[6] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[7] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[8] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.
[9] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[10] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[11] Bruno Scherrer,et al. Momentum in Reinforcement Learning , 2020, AISTATS.
[12] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[13] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[14] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[16] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.
[17] Bruno Scherrer,et al. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes , 2012, NIPS.
[18] K. I. M. McKinnon,et al. On the Generation of Markov Decision Processes , 1995 .
[19] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[20] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[21] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[22] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.
[23] Leon Hirsch,et al. Fundamentals Of Convex Analysis , 2016 .
[24] Matthieu Geist,et al. Munchausen Reinforcement Learning , 2020, NeurIPS.
[25] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[26] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[27] L. Baird. Reinforcement Learning Through Gradient Descent , 1999 .
[28] Matthieu Geist,et al. Softened Approximate Policy Iteration for Markov Games , 2016, ICML.
[29] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[30] Lawrence Carin,et al. Revisiting the Softmax Bellman Operator: New Benefits and New Perspective , 2018, ICML.
[31] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[32] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[33] Matthieu Geist,et al. Difference of Convex Functions Programming for Reinforcement Learning , 2014, NIPS.
[34] Matthieu Geist,et al. Is the Bellman residual a bad proxy? , 2016, NIPS.
[35] Shie Mannor,et al. Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs , 2020, AAAI.
[36] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[37] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[38] Matthew Fellows,et al. VIREL: A Variational Inference Framework for Reinforcement Learning , 2018, NeurIPS.
[39] Peter L. Bartlett,et al. POLITEX: Regret Bounds for Policy Iteration using Expert Prediction , 2019, ICML.
[40] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.