暂无分享,去创建一个
Qi Cai | Zhuoran Yang | Boyi Liu | Zhaoran Wang | Zhuoran Yang | Zhaoran Wang | Qi Cai | Boyi Liu
[1] Qi Cai,et al. Neural Temporal-Difference Learning Converges to Global Optima , 2019, NeurIPS.
[2] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[3] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning with Linear Transition Models , 2019, ICML 2019.
[4] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[5] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[6] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[7] Le Song,et al. Smoothed Dual Embedding Control , 2017, ArXiv.
[8] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] Francis Bach,et al. A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.
[11] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[12] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[13] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[14] Hao Zhu,et al. Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies , 2019, SIAM J. Control. Optim..
[15] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[16] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[17] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[18] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[19] Larry Rudolph,et al. Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms? , 2018, ArXiv.
[20] Shie Mannor,et al. Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..
[21] Paul Wagner,et al. A reinterpretation of the policy oscillation phenomenon in approximate policy iteration , 2011, NIPS.
[22] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[23] Tuo Zhao,et al. Toward Understanding the Importance of Noise in Training Neural Networks , 2019, ICML.
[24] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[25] Yuan Cao,et al. A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.
[26] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[27] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[28] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[29] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[30] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[31] F. Facchinei,et al. Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .
[32] Tuo Zhao,et al. Towards Understanding the Importance of Noise in Training Neural Networks , 2019, ICML 2019.
[33] Jason D. Lee,et al. Neural Temporal-Difference and Q-Learning Provably Converge to Global Optima. , 2019, 1905.10027.
[34] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[35] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[36] Sham M. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[37] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[38] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[39] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[40] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[41] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[42] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[43] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[44] Zhaoran Wang,et al. Neural Policy Gradient Methods: Global Optimality and Rates of Convergence , 2019, ICLR.
[45] Paul Wagner,et al. Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result , 2013, NIPS.
[46] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[47] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[48] Mengdi Wang,et al. Deep Primal-Dual Reinforcement Learning: Accelerating Actor-Critic using Bellman Duality , 2017, ArXiv.
[49] Oladimeji Farri,et al. Diagnostic Inferencing via Improving Clinical Concept Extraction with Deep Reinforcement Learning: A Preliminary Study , 2017, MLHC.
[50] Michael Rabadi,et al. Kernel Methods for Machine Learning , 2015 .
[51] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[52] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[53] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[54] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[55] Marcello Restelli,et al. Boosted Fitted Q-Iteration , 2017, ICML.
[56] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[57] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[58] Etienne Perot,et al. Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.