暂无分享,去创建一个
[1] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[2] Martha White,et al. An Off-policy Policy Gradient Theorem Using Emphatic Weightings , 2018, NeurIPS.
[3] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[4] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[5] Emma Brunskill,et al. Off-Policy Policy Gradient with State Distribution Correction , 2019, UAI 2019.
[6] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[7] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[8] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[10] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[11] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[12] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[13] Shie Mannor,et al. Consistent On-Line Off-Policy Evaluation , 2017, ICML.
[14] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[15] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[16] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[17] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.