暂无分享,去创建一个
[1] Harold R. Parks,et al. The Implicit Function Theorem , 2002 .
[2] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[3] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[4] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[5] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[6] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[7] James Zou,et al. The Effects of Memory Replay in Reinforcement Learning , 2017, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[8] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[9] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.
[10] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Sean P. Meyn,et al. Zap Q-Learning , 2017, NIPS.
[13] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[14] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[15] Sergey Levine,et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.
[16] Guy Shani,et al. An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..
[17] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[18] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.
[19] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[20] Sham M. Kakade,et al. Provably Efficient Maximum Entropy Exploration , 2018, ICML.
[21] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[22] Richard S. Sutton,et al. A Deeper Look at Experience Replay , 2017, ArXiv.
[23] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[24] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[25] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[26] Karol Hausman,et al. Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.
[27] Benjamin Van Roy,et al. On the existence of fixed points for approximate value iteration and temporal-difference learning , 2000 .
[28] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[29] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[30] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[31] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[32] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[33] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[34] 丸山 徹. Convex Analysisの二,三の進展について , 1977 .
[35] Jeff A. Bilmes,et al. Combating Label Noise in Deep Learning Using Abstention , 2019, ICML.
[36] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[37] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[38] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[39] Chunlin Chen,et al. A novel DDPG method with prioritized experience replay , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).
[40] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[41] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[42] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[43] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[44] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[45] Bruno Scherrer,et al. Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies , 2013, ArXiv.
[46] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[47] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[48] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[49] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[50] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[51] Razvan Pascanu,et al. Ray Interference: a Source of Plateaus in Deep Reinforcement Learning , 2019, ArXiv.
[52] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.
[53] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[54] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[55] Sergey Levine,et al. Diagnosing Bottlenecks in Deep Q-learning Algorithms , 2019, ICML.
[56] Martha White,et al. The Utility of Sparse Representations for Control in Reinforcement Learning , 2018, AAAI.
[57] Ruosong Wang,et al. Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle , 2019, NeurIPS.
[58] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.