An Improved Trust-Region Method for Off-Policy Deep Reinforcement Learning
暂无分享,去创建一个
[1] Hepeng Li,et al. An Analytical Update Rule for General Policy Optimization , 2021, ICML.
[2] Nicolas Le Roux,et al. A general class of surrogate functions for stable and efficient reinforcement learning , 2021, AISTATS.
[3] Xiangnan Zhong,et al. A Reinforcement Learning-Based Control Approach for Unknown Nonlinear Systems with Persistent Adversarial Inputs , 2021, 2021 International Joint Conference on Neural Networks (IJCNN).
[4] Wenjia Meng,et al. An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning , 2021, IEEE Transactions on Neural Networks and Learning Systems.
[5] Ngo Anh Vien,et al. Differentiable Trust Region Layers for Deep Reinforcement Learning , 2021, ICLR.
[6] Longbing Cao,et al. Maximum Entropy Reinforcement Learning with Evolution Strategies , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).
[7] M. Ghavamzadeh,et al. Mirror Descent Policy Optimization , 2020, ICLR.
[8] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[9] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[10] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[11] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[12] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[13] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.
[14] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[15] J. Schulman,et al. OpenAI Gym , 2016, ArXiv.
[16] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[17] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[18] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[19] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[20] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[21] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[23] Michael I. Jordan,et al. Trust Region Policy Optimization , 2015, ICML.
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[26] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[27] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[28] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[29] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[30] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[31] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[32] Michael I. Jordan,et al. Polyak-Ruppert Averaged Q-Leaning is Statistically Efficient , 2021, ArXiv.
[33] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[34] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.