Breaking the Deadly Triad with a Target Network
暂无分享,去创建一个
[1] R. Sutton,et al. Average-Reward Off-Policy Policy Evaluation with Function Approximation , 2021, ICML.
[2] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints , 2021, NeurIPS.
[3] R. Sutton,et al. Learning and Planning in Average-Reward Markov Decision Processes , 2020, ICML.
[4] Yue Wang,et al. Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise , 2020, UAI.
[5] Quanquan Gu,et al. A Finite Time Analysis of Two Time-Scale Actor Critic Methods , 2020, NeurIPS.
[6] Quanquan Gu,et al. A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation , 2019, ICML.
[7] Ruosong Wang,et al. Optimism in Reinforcement Learning with Generalized Linear Function Approximation , 2019, ICLR.
[8] Niao He,et al. A Unified Switching System Perspective and O.D.E. Analysis of Q-Learning Algorithms , 2019, ArXiv.
[9] Hengshuai Yao,et al. Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation , 2019, ICML.
[10] Shalabh Bhatnagar,et al. A Convergent Off-Policy Temporal Difference Algorithm , 2019, ECAI.
[11] Ana Busic,et al. Zap Q-Learning With Nonlinear Function Approximation , 2019, NeurIPS.
[12] Ruosong Wang,et al. Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle , 2019, NeurIPS.
[13] Thinh T. Doan,et al. Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning , 2019, Autom..
[14] Jason D. Lee,et al. Neural Temporal Difference and Q Learning Provably Converge to Global Optima , 2019, Mathematics of Operations Research.
[15] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[16] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[17] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.
[18] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[19] Shaofeng Zou,et al. Finite-Sample Analysis for SARSA with Linear Function Approximation , 2019, NeurIPS.
[20] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[21] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.
[22] K. Zhang,et al. Convergent Reinforcement Learning with Function Approximation: A Bilevel Optimization Perspective , 2018 .
[23] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[24] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[25] Huizhen Yu,et al. On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning , 2017, ArXiv.
[26] Lihong Li,et al. Stochastic Variance Reduction Methods for Policy Evaluation , 2017, ICML.
[27] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[28] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[29] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[30] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[31] Zheng Wen,et al. Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.
[32] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.
[33] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[34] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[35] Ronald Parr,et al. Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.
[36] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[37] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[38] Marek Petrik,et al. Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.
[39] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[40] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[41] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[42] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[43] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML '08.
[44] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[45] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[46] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[47] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[48] Benjamin Van Roy,et al. Feature-based methods for large scale dynamic programming , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[49] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[50] A. Tikhonov,et al. Numerical Methods for the Solution of Ill-Posed Problems , 1995 .
[51] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[52] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[53] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[54] C. A. Desoer,et al. Nonlinear Systems Analysis , 1978 .
[55] Diogo Carvalho,et al. A new convergent variant of Q-learning with linear function approximation , 2020, NeurIPS.
[56] Ruosong Wang,et al. Agnostic Q-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity , 2020 .
[57] Ronald E. Parr,et al. L1 Regularized Linear Temporal Difference Learning , 2012 .
[58] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[59] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[60] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.