Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors.
暂无分享,去创建一个
[1] J. Rustagi. Optimization Techniques in Statistics , 1994 .
[2] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[3] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[4] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[5] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[6] Warren B. Powell,et al. Bias-corrected Q-learning to control max-operator bias in Q-learning , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[7] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[9] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[10] Zeb Kurth-Nelson,et al. A distributional code for value in dopamine-based reinforcement learning , 2020, Nature.
[11] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[14] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[15] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[16] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[17] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[18] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[19] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[20] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[21] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[22] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[23] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[24] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[25] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[26] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[27] Marc G. Bellemare,et al. A Comparative Analysis of Expected and Distributional Reinforcement Learning , 2019, AAAI.
[28] Koray Kavukcuoglu,et al. Combining policy gradient and Q-learning , 2016, ICLR.
[29] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[30] Warren B. Powell,et al. Bias-Corrected Q-Learning With Multistate Extension , 2019, IEEE Transactions on Automatic Control.
[31] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[32] Geoffrey E. Hinton,et al. Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..
[33] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[34] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[35] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[36] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[37] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[38] Yee Whye Teh,et al. An Analysis of Categorical Distributional Reinforcement Learning , 2018, AISTATS.
[39] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[40] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.