Striving for Simplicity and Performance in Off-Policy DRL: Output Normalization and Non-Uniform Sampling
暂无分享,去创建一个
[1] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[2] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[3] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[4] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[5] Sergey Levine,et al. Diagnosing Bottlenecks in Deep Q-learning Algorithms , 2019, ICML.
[6] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[7] Yuval Tassa,et al. Safe Exploration in Continuous Action Spaces , 2018, ArXiv.
[8] Emanuel Todorov,et al. General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.
[9] Yiming Zhang,et al. Supervised Policy Update for Deep Reinforcement Learning , 2018, ICLR.
[10] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[11] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[12] Karl Tuyls,et al. The importance of experience replay database composition in deep reinforcement learning , 2015 .
[13] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[14] R Ratcliff,et al. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.
[15] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[16] Petros Koumoutsakos,et al. Remember and Forget for Experience Replay , 2018, ICML.
[17] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[18] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.
[19] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[20] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[21] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[22] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[23] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[24] Christopher Schulze,et al. ViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning, & Snapshot Ensembling , 2018, IntelliSys.
[25] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[26] Volker Tresp,et al. Curiosity-Driven Experience Prioritization via Density Estimation , 2018, ArXiv.
[27] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[28] Peter Henderson,et al. Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.
[29] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[31] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[32] Pieter Abbeel,et al. Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.
[33] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[34] Sebastian Scherer,et al. Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution , 2017, ICML.
[35] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[36] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[37] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[38] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[39] Chunlin Chen,et al. A novel DDPG method with prioritized experience replay , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).
[40] Han Liu,et al. Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with Applications , 2018, ICLR.
[41] James L. McClelland,et al. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.
[42] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[43] R. French. Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.
[44] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[45] Anthony V. Robins,et al. Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..
[46] Shin-ichi Maeda,et al. Clipped Action Policy Gradient , 2018, ICML.
[47] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[48] Peter Stone,et al. Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.
[49] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[50] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.