暂无分享,去创建一个
Yuxin Chen | Chen Cheng | Yuting Wei | Yuejie Chi | Shicong Cen | Yuxin Chen | Yuejie Chi | Yuting Wei | Chen Cheng | Shicong Cen
[1] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[4] Dale Schuurmans,et al. On the Global Convergence Rates of Softmax Policy Gradient Methods , 2020, ICML.
[5] Sham M. Kakade,et al. Provably Efficient Maximum Entropy Exploration , 2018, ICML.
[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[7] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT 2020.
[8] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[9] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[10] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[11] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[12] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[13] Amir G. Aghdam,et al. Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).
[14] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[15] Eric Moulines,et al. Non-asymptotic Analysis of Biased Stochastic Approximation Scheme , 2019, COLT.
[16] Yuantao Gu,et al. Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction , 2022, IEEE Transactions on Information Theory.
[17] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[18] Changxiao Cai,et al. Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis , 2021 .
[19] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[20] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.
[21] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[22] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[23] Dale Schuurmans,et al. Maximum Entropy Monte-Carlo Planning , 2019, NeurIPS.
[24] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[25] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[26] Jalaj Bhandari,et al. Global Optimality Guarantees For Policy Gradient Methods , 2019, ArXiv.
[27] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[28] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[29] Armin Zare,et al. Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem , 2019, ArXiv.
[30] Hao Zhu,et al. Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies , 2019, SIAM J. Control. Optim..
[31] Bruno Scherrer,et al. Leverage the Average: an Analysis of Regularization in RL , 2020, ArXiv.
[32] Zhe Wang,et al. Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms , 2020, ArXiv.
[33] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
[34] S. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[35] Tamer Basar,et al. Policy Optimization for H2 Linear Control with H∞ Robustness Guarantee: Implicit Regularization and Global Convergence , 2020, L4DC.
[36] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[37] Benjamin Recht,et al. The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint , 2018, COLT.
[38] Michal Valko,et al. Planning in entropy-regularized Markov decision processes and games , 2019, NeurIPS.
[39] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[40] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[41] Kaiqing Zhang,et al. Policy Optimization for H2 Linear Control with H∞ Robustness Guarantee: Implicit Regularization and Global Convergence , 2019, SIAM J. Control. Optim..
[42] Qi Cai,et al. Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy , 2019, NeurIPS.
[43] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[44] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[45] Bin Hu,et al. Convergence Guarantees of Policy Optimization Methods for Markovian Jump Linear Systems , 2020, 2020 American Control Conference (ACC).
[46] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[47] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[48] Yuantao Gu,et al. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model , 2020, NeurIPS.
[49] Jalaj Bhandari,et al. A Note on the Linear Convergence of Policy Gradient Methods , 2020, ArXiv.
[50] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[51] S. Kakade,et al. Reinforcement Learning: Theory and Algorithms , 2019 .
[52] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[53] Shie Mannor,et al. Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs , 2020, AAAI.
[54] Yuantao Gu,et al. Softmax Policy Gradient Methods Can Take Exponential Time to Converge , 2021, COLT.
[55] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..
[56] Quanquan Gu,et al. A Finite Time Analysis of Two Time-Scale Actor Critic Methods , 2020, NeurIPS.
[57] Zhaoran Wang,et al. Neural Policy Gradient Methods: Global Optimality and Rates of Convergence , 2019, ICLR.
[58] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[59] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[60] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[61] R Bellman,et al. On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.