Geometric Insights into the Convergence of Nonlinear TD Learning
暂无分享,去创建一个
[1] Samet Oymak,et al. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks , 2019, IEEE Journal on Selected Areas in Information Theory.
[2] V. Borkar. Stochastic approximation with two time scales , 1997 .
[3] Yann Ollivier,et al. Approximate Temporal Difference Learning is a Gradient Descent for Reversible Policies , 2018, ArXiv.
[4] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[5] Sergey Levine,et al. Diagnosing Bottlenecks in Deep Q-learning Algorithms , 2019, ICML.
[6] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[7] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[8] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[9] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[10] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[11] H. Robbins. A Stochastic Approximation Method , 1951 .
[12] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[15] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[16] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[17] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[18] Jianfeng Lu,et al. Temporal-difference learning for nonlinear value function approximation in the lazy training regime , 2019, ArXiv.
[19] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[20] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[21] Qi Cai,et al. Neural Temporal-Difference Learning Converges to Global Optima , 2019, NeurIPS.
[22] Francis Bach,et al. A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.
[23] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[24] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.
[25] Martha White,et al. Two-Timescale Networks for Nonlinear Value Function Approximation , 2019, ICLR.
[26] Ohad Shamir,et al. Are ResNets Provably Better than Linear Predictors? , 2018, NeurIPS.
[27] K. Zhang,et al. Convergent Reinforcement Learning with Function Approximation: A Bilevel Optimization Perspective , 2018 .
[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Charles R. Johnson,et al. Topics in Matrix Analysis , 1991 .