暂无分享,去创建一个
[1] Jalaj Bhandari,et al. A Note on the Linear Convergence of Policy Gradient Methods , 2020, ArXiv.
[2] Cho-Jui Hsieh,et al. Convergence of Adversarial Training in Overparametrized Neural Networks , 2019, NeurIPS.
[3] Nevena Lazic,et al. Provably Efficient Adaptive Approximate Policy Iteration , 2020, ArXiv.
[4] Yuxi Li,et al. Deep Reinforcement Learning: An Overview , 2017, ArXiv.
[5] Martin J. Wainwright,et al. Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems , 2018, AISTATS.
[6] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[7] Yongxin Chen,et al. On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost , 2019, ArXiv.
[8] Yuan Cao,et al. A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.
[9] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[10] Bo Dai,et al. GenDICE: Generalized Offline Estimation of Stationary Values , 2020, ICLR.
[11] Sham M. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[12] Quanquan Gu,et al. Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks , 2019, AAAI.
[13] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[14] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[15] Bo Dai,et al. Reinforcement Learning via Fenchel-Rockafellar Duality , 2020, ArXiv.
[16] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[17] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2020, ICML.
[18] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[19] Vivek S. Borkar,et al. The actor-critic algorithm as multi-time-scale stochastic approximation , 1997 .
[20] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
[21] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[22] Pierre Baldi,et al. Solving the Rubik’s cube with deep reinforcement learning and search , 2019, Nature Machine Intelligence.
[23] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[24] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[25] Jalaj Bhandari,et al. Global Optimality Guarantees For Policy Gradient Methods , 2019, ArXiv.
[26] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[27] Yongxin Chen,et al. Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games , 2019, ICLR.
[28] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[29] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[30] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for Linearized Control Problems , 2018, ICML 2018.
[31] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[32] Yifei Ma,et al. Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling , 2019, NeurIPS.
[33] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[34] Yingbin Liang,et al. Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples , 2019, NeurIPS.
[35] Benjamin Recht,et al. The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint , 2018, COLT.
[36] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[37] Joel A. Tropp,et al. An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..
[38] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[39] Shie Mannor,et al. Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..
[40] Dale Schuurmans,et al. On the Global Convergence Rates of Softmax Policy Gradient Methods , 2020, ICML.
[41] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[42] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[43] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[44] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[45] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[46] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[47] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[48] Francis Bach,et al. A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.
[49] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[50] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[51] Yingbin Liang,et al. Finite-Sample Analysis for SARSA with Linear Function Approximation , 2019, NeurIPS.
[52] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[53] Weitong Zhang,et al. A Finite Time Analysis of Two Time-Scale Actor Critic Methods , 2020, NeurIPS.
[54] Cho-Jui Hsieh,et al. Convergence of Adversarial Training in Overparametrized Networks , 2019, ArXiv.
[55] Bruno Scherrer. On the Performance Bounds of some Policy Search Dynamic Programming Algorithms , 2013, ArXiv.
[56] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[57] Peter L. Bartlett,et al. POLITEX: Regret Bounds for Policy Iteration using Expert Prediction , 2019, ICML.
[58] Hamid Reza Maei,et al. Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation , 2018, ArXiv.
[59] Zhaoran Wang,et al. A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic , 2020, ArXiv.
[60] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[61] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[62] Etienne Perot,et al. Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.
[63] Zhe Wang,et al. Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms , 2020, ArXiv.
[64] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[65] Mehran Mesbahi,et al. LQR through the Lens of First Order Methods: Discrete-time Case , 2019, ArXiv.
[66] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[67] Shalabh Bhatnagar,et al. An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes , 2010, Syst. Control. Lett..
[68] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[69] Zhaoran Wang,et al. Neural Policy Gradient Methods: Global Optimality and Rates of Convergence , 2019, ICLR.
[70] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[71] Marcello Restelli,et al. Boosted Fitted Q-Iteration , 2017, ICML.
[72] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[73] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[74] Ron Meir,et al. A Convergent Online Single Time Scale Actor Critic Algorithm , 2009, J. Mach. Learn. Res..
[75] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[76] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[77] Lei Wu. How SGD Selects the Global Minima in Over-parameterized Learning : A Dynamical Stability Perspective , 2018 .
[78] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[79] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[80] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[81] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[82] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[83] Hao Zhu,et al. Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies , 2019, SIAM J. Control. Optim..
[84] Nevena Lazic,et al. Exploration-Enhanced POLITEX , 2019, ArXiv.
[85] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[86] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[87] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[88] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[89] A. Antos,et al. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[90] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .