Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
暂无分享,去创建一个
Shin Ishii | Yuji Yasui | Eiji Uchibe | Shota Ohnishi | Yotaro Yamaguchi | Kosuke Nakanishi | E. Uchibe | S. Ishii | Shotaro Ohnishi | Yotaro Yamaguchi | Kosuke Nakanishi | Y. Yasui
[1] Kenji Doya,et al. Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning , 2019, AISTATS.
[2] Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.
[3] Kavosh Asadi,et al. DeepMellow: Removing the Need for a Target Network in Deep Q-Learning , 2019, IJCAI.
[4] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[5] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[6] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[7] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[8] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.
[9] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[10] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[11] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[12] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[13] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[16] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[17] Matteo Hessel,et al. When to use parametric models in reinforcement learning? , 2019, NeurIPS.
[18] Sergey Levine,et al. Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.
[19] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[20] Kenji Doya,et al. From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning , 2016, Neural Networks.
[21] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[22] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[23] Richard S. Sutton,et al. Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target , 2019, ArXiv.
[24] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[25] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[26] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[27] K. Shadan,et al. Available online: , 2012 .
[28] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[29] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[30] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.
[31] Sergey Levine,et al. PLATO: Policy learning using adaptive trajectory optimization , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[32] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[33] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.
[34] Yuxi Li,et al. Deep Reinforcement Learning , 2018, Reinforcement Learning for Cyber-Physical Systems.
[35] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[36] Takamitsu Matsubara,et al. Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation , 2019, Robotics Auton. Syst..
[37] Yang Liu,et al. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.
[38] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[39] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[40] Thommen George Karimpanal,et al. Experience Replay Using Transition Sequences , 2017, Front. Neurorobot..
[41] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[42] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[43] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[44] Mykel J. Kochenderfer,et al. Weighted Double Q-learning , 2017, IJCAI.
[45] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[46] Juan F. García. Unifying n-Step Temporal-Difference Action-Value Methods , 2019 .
[47] Peter Stone,et al. TD Learning with Constrained Gradients , 2018 .
[48] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.