Neural Temporal-Difference Learning Converges to Global Optima
暂无分享,去创建一个
[1] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[2] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[3] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.
[4] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[5] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[6] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[7] Pascal Vincent,et al. Convergent Tree-Backup and Retrace with Function Approximation , 2017, ICML.
[8] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[9] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] Patrick T. Harker,et al. Finite-dimensional variational inequality and nonlinear complementarity problems: A survey of theory, algorithms and applications , 1990, Math. Program..
[12] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[13] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.
[14] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[15] David Pfau,et al. Connecting Generative Adversarial Networks and Actor-Critic Methods , 2016, ArXiv.
[16] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[17] Alessandro Lazaric,et al. LSTD with Random Projections , 2010, NIPS.
[18] Matthieu Geist,et al. Algorithmic Survey of Parametric Value Function Approximation , 2013, IEEE Transactions on Neural Networks and Learning Systems.
[19] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[20] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[21] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[22] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[23] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[24] Francis Bach,et al. A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.
[25] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[26] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[27] Tie-Yan Liu,et al. Finite sample analysis of the GTD Policy Evaluation Algorithms in Markov Setting , 2017, NIPS.
[28] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[29] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[30] JainPrateek,et al. Non-convex Optimization for Machine Learning , 2017 .
[31] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[32] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[33] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[34] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[35] Thomas Brox,et al. TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning , 2018, ICLR.
[36] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[37] Yingbin Liang,et al. Finite-Sample Analysis for SARSA and Q-Learning with Linear Function Approximation , 2019, ArXiv.
[38] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[39] Cong Ma,et al. A Selective Overview of Deep Learning , 2019, Statistical science : a review journal of the Institute of Mathematical Statistics.
[40] Prateek Jain,et al. Non-convex Optimization for Machine Learning , 2017, Found. Trends Mach. Learn..
[41] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[42] V. Borkar,et al. Stochastic approximation , 2013, Resonance.
[43] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[44] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[45] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.
[46] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[47] F. Facchinei,et al. Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .
[48] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[49] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[50] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[51] Dimitri P. Bertsekas,et al. Feature-based aggregation and deep reinforcement learning: a survey and some new implementations , 2018, IEEE/CAA Journal of Automatica Sinica.
[52] Alex Smola,et al. Kernel methods in machine learning , 2007, math/0701907.
[53] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[54] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[55] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[56] Benjamin Recht,et al. Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator , 2017, ICML.
[57] A. Eigen-analysis. Stochastic Variance Reduction Methods for Policy Evaluation , 2017 .
[58] A. Rahimi,et al. Uniform approximation of functions with random bases , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.
[59] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[60] Shie Mannor,et al. Finite Sample Analyses for TD(0) With Function Approximation , 2017, AAAI.
[61] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[62] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.