Composite Q-learning: Multi-scale Q-function Decomposition and Separable Optimization
暂无分享,去创建一个
[1] Joelle Pineau,et al. Separating value functions across time-scales , 2019, ICML 2019.
[2] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[3] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[4] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[5] Richard S. Sutton,et al. Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning , 2020, AAAI.
[6] Sergey Levine,et al. Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.
[7] Richard S. Sutton,et al. Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target , 2019, ArXiv.
[8] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[9] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[10] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[11] Susan A. Murphy,et al. A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..
[12] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[13] Martha White,et al. General Value Function Networks , 2018, J. Artif. Intell. Res..
[14] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[15] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[16] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[17] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[18] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[19] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[20] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[21] Gabriel Kalweit,et al. Interpretable Multi Time-scale Constraints in Model-free Deep Reinforcement Learning for Autonomous Driving , 2020, ArXiv.
[22] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.