Planning in entropy-regularized Markov decision processes and games
暂无分享,去创建一个
Michal Valko | Omar Darwiche Domingues | Jean-Bastien Grill | Pierre Ménard | Rémi Munos | R. Munos | Michal Valko | O. D. Domingues | Jean-Bastien Grill | Pierre Ménard
[1] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[2] Rémi Munos,et al. Optimistic Planning of Deterministic Systems , 2008, EWRL.
[3] Rémi Munos,et al. Optimistic Planning in Markov Decision Processes Using a Generative Model , 2014, NIPS.
[4] Lucian Busoniu,et al. Optimistic planning for Markov decision processes , 2012, AISTATS.
[5] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[6] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[7] Carmel Domshlak,et al. Simple Regret Optimization in Online Planning for Markov Decision Processes , 2012, J. Artif. Intell. Res..
[8] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[9] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[10] Rémi Munos,et al. Open Loop Optimistic Planning , 2010, COLT.
[11] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[12] Jennifer Healey,et al. Scale-free adaptive planning for deterministic dynamics & discounted rewards , 2019, ICML.
[13] Csaba Szepesvári,et al. Structured Best Arm Identification with Fixed Confidence , 2017, ALT.
[14] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[15] Rémi Munos,et al. Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning , 2016, NIPS.
[16] Peter Bro Miltersen,et al. Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor , 2010, JACM.
[17] Wouter M. Koolen,et al. Monte-Carlo Tree Search by Best Arm Identification , 2017, NIPS.
[18] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[19] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[20] Edouard Leurent,et al. Practical Open-Loop Optimistic Planning , 2019, ECML/PKDD.
[21] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[22] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[23] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[24] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[25] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.
[26] Thomas J. Walsh,et al. Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.
[27] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.