暂无分享,去创建一个
[1] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[2] Jalaj Bhandari,et al. A Note on the Linear Convergence of Policy Gradient Methods , 2020, ArXiv.
[3] Lei Wu. How SGD Selects the Global Minima in Over-parameterized Learning : A Dynamical Stability Perspective , 2018 .
[4] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[5] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[8] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[9] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[10] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[11] Dale Schuurmans,et al. On the Global Convergence Rates of Softmax Policy Gradient Methods , 2020, ICML.
[12] Marcello Restelli,et al. Boosted Fitted Q-Iteration , 2017, ICML.
[13] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[14] S. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[15] Bo Dai,et al. Reinforcement Learning via Fenchel-Rockafellar Duality , 2020, ArXiv.
[16] Nevena Lazic,et al. Exploration-Enhanced POLITEX , 2019, ArXiv.
[17] Vivek S. Borkar,et al. The actor-critic algorithm as multi-time-scale stochastic approximation , 1997 .
[18] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[19] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[20] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[21] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[22] Hao Zhu,et al. Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies , 2019, SIAM J. Control. Optim..
[23] Martin J. Wainwright,et al. Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems , 2018, AISTATS.
[24] Yuan Cao,et al. A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.
[25] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[26] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[27] Hamid Reza Maei,et al. Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation , 2018, ArXiv.
[28] Jalaj Bhandari,et al. Global Optimality Guarantees For Policy Gradient Methods , 2019, ArXiv.
[29] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[30] Bo Dai,et al. GenDICE: Generalized Offline Estimation of Stationary Values , 2020, ICLR.
[31] Yongxin Chen,et al. Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games , 2019, ICLR.
[32] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[33] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[34] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[35] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[36] Zhaoran Wang,et al. A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic , 2020, ArXiv.
[37] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[38] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[39] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
[40] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[41] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[42] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[43] Yongxin Chen,et al. On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost , 2019, ArXiv.
[44] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[45] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[46] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[47] Quanquan Gu,et al. A Finite Time Analysis of Two Time-Scale Actor Critic Methods , 2020, NeurIPS.
[48] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for Linearized Control Problems , 2018, ICML 2018.
[49] Etienne Perot,et al. Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.
[50] Zhaoran Wang,et al. Neural Policy Gradient Methods: Global Optimality and Rates of Convergence , 2019, ICLR.
[51] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[52] Yingbin Liang,et al. Finite-Sample Analysis for SARSA with Linear Function Approximation , 2019, NeurIPS.
[53] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[54] Yuxi Li,et al. Deep Reinforcement Learning: An Overview , 2017, ArXiv.
[55] Yingbin Liang,et al. Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples , 2019, NeurIPS.
[56] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[57] Zhe Wang,et al. Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms , 2020, ArXiv.
[58] Ron Meir,et al. A Convergent Online Single Time Scale Actor Critic Algorithm , 2009, J. Mach. Learn. Res..
[59] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[60] Shie Mannor,et al. Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..
[61] Francis Bach,et al. A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.
[62] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[63] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[64] Pierre Baldi,et al. Solving the Rubik’s cube with deep reinforcement learning and search , 2019, Nature Machine Intelligence.
[65] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[66] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[67] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[68] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[69] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[70] Cho-Jui Hsieh,et al. Convergence of Adversarial Training in Overparametrized Networks , 2019, ArXiv.
[71] Quanquan Gu,et al. Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks , 2019, AAAI.
[72] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[73] Joel A. Tropp,et al. An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..
[74] Bruno Scherrer. On the Performance Bounds of some Policy Search Dynamic Programming Algorithms , 2013, ArXiv.
[75] Yifei Ma,et al. Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling , 2019, NeurIPS.
[76] Cho-Jui Hsieh,et al. Convergence of Adversarial Training in Overparametrized Neural Networks , 2019, NeurIPS.
[77] A. Antos,et al. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[78] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[79] Peter L. Bartlett,et al. POLITEX: Regret Bounds for Policy Iteration using Expert Prediction , 2019, ICML.
[80] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[81] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[82] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[83] Benjamin Recht,et al. The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint , 2018, COLT.
[84] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[85] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[86] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[87] Mehran Mesbahi,et al. LQR through the Lens of First Order Methods: Discrete-time Case , 2019, ArXiv.
[88] Nevena Lazic,et al. Provably Efficient Adaptive Approximate Policy Iteration , 2020, ArXiv.
[89] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[90] Shalabh Bhatnagar. An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes , 2010, Syst. Control. Lett..
[91] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.