TD Convergence: An Optimization Perspective
暂无分享,去创建一个
[1] Do Wan Kim,et al. Regularized Q-learning , 2022, ArXiv.
[2] Alexander J. Smola,et al. Faster Deep Reinforcement Learning with Slower Online Network , 2021, NeurIPS.
[3] Clare Lyle,et al. On The Effect of Auxiliary Tasks on Representation Dynamics , 2021, AISTATS.
[4] Hengshuai Yao,et al. Breaking the Deadly Triad with a Target Network , 2021, ICML.
[5] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints , 2021, NeurIPS.
[6] Gergely Neu,et al. Logistic $Q$-Learning , 2020, AISTATS.
[7] Marc G. Bellemare,et al. Representations for Stable Off-Policy Reinforcement Learning , 2020, ICML.
[8] Adam White,et al. Gradient Temporal-Difference Learning with Regularized Corrections , 2020, ICML.
[9] S. Kakade,et al. FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs , 2020, NeurIPS.
[10] Siva Theja Maguluri,et al. Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes , 2020, ArXiv.
[11] Shalabh Bhatnagar,et al. A Convergent Off-Policy Temporal Difference Algorithm , 2019, ECAI.
[12] Ana Busic,et al. Zap Q-Learning With Nonlinear Function Approximation , 2019, NeurIPS.
[13] Ruosong Wang,et al. Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle , 2019, NeurIPS.
[14] Qiang Liu,et al. A Kernel Loss for Solving the Bellman Equation , 2019, NeurIPS.
[15] J. Lee,et al. Neural Temporal-Difference Learning Converges to Global Optima , 2019, NeurIPS.
[16] Donghwan Lee,et al. Target-Based Temporal Difference Learning , 2019, ICML.
[17] Shaofeng Zou,et al. Finite-Sample Analysis for SARSA with Linear Function Approximation , 2019, NeurIPS.
[18] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[19] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[20] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[21] Amir Beck,et al. First-Order Methods in Optimization , 2017 .
[22] Richard S. Sutton,et al. A First Empirical Study of Emphatic Temporal Difference Learning , 2017, ArXiv.
[23] J. Z. Kolter,et al. Input Convex Neural Networks , 2016, ICML.
[24] Marek Petrik,et al. Proximal Gradient Temporal Difference Learning Algorithms , 2016, IJCAI.
[25] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[26] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[27] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[28] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[29] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[30] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.
[31] Dan Lizotte,et al. Convergent Fitted Value Iteration with Linear Function Approximation , 2011, NIPS.
[32] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[33] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[34] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[35] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[36] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[37] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[38] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[39] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[40] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[41] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[42] Carl D. Meyer,et al. Matrix Analysis and Applied Linear Algebra , 2000 .
[43] Doina Precup,et al. Exponentiated Gradient Methods for Reinforcement Learning , 1997, ICML.
[44] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[45] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[46] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[47] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[48] D. Schuurmans,et al. Understanding and Leveraging Overparameterization in Recursive Value Estimation , 2022, ICLR.
[49] J. Z. Kolter,et al. The Pitfalls of Regularization in Off-Policy TD Learning , 2022, NeurIPS.
[50] Ruosong Wang,et al. Agnostic Q-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity , 2020 .
[51] Ehsan Saleh. Deterministic Bellman Residual Minimization , 2019 .
[52] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[53] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[54] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.