On Principled Entropy Exploration in Policy Optimization
暂无分享,去创建一个
Martin Müller | Dale Schuurmans | Ruitong Huang | Chenjun Xiao | Jincheng Mei | D. Schuurmans | Ruitong Huang | Martin Müller | Jincheng Mei | Chenjun Xiao
[1] Masashi Sugiyama,et al. Guide Actor-Critic for Continuous Control , 2017, ICLR.
[2] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[3] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[4] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[5] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[6] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[7] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[8] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[9] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[10] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[11] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[12] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[13] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.
[14] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[15] Jan Peters,et al. Learning of Non-Parametric Control Policies with High-Dimensional State Features , 2015, AISTATS.
[16] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[17] Gerhard J. Woeginger,et al. Operations Research Letters , 2011 .
[18] Dale Schuurmans,et al. Improving Policy Gradient by Exploring Under-appreciated Rewards , 2016, ICLR.
[19] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[20] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[21] Bin Yu,et al. Artificial intelligence and statistics , 2018, Frontiers of Information Technology & Electronic Engineering.
[22] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[23] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[24] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[25] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[26] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.