On the Global Convergence Rates of Softmax Policy Gradient Methods
暂无分享,去创建一个
[1] Martin Müller,et al. On Principled Entropy Exploration in Policy Optimization , 2019, IJCAI.
[2] Jalaj Bhandari,et al. Global Optimality Guarantees For Policy Gradient Methods , 2019, ArXiv.
[3] Gene H. Golub,et al. Some modified matrix eigenvalue problems , 1973, Milestones in Matrix Computation.
[4] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[5] Neil Walton. A Short Note on Soft-max and Policy Gradients in Bandits Problems , 2020, ArXiv.
[6] Dale Schuurmans,et al. Maximum Entropy Monte-Carlo Planning , 2019, NeurIPS.
[7] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[8] Yurii Nesterov,et al. Lectures on Convex Optimization , 2018 .
[9] Yi Zhou,et al. Convergence of Cubic Regularization for Nonconvex Optimization under KL Property , 2018, NeurIPS.
[10] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[11] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Ian Osband,et al. Making Sense of Reinforcement Learning and Probabilistic Inference , 2020, ICLR.
[14] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[15] Sham M. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[16] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[17] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[18] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.
[19] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[20] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[21] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[22] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[23] Shie Mannor,et al. Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs , 2020, AAAI.
[24] Tomáš Bárta. Rate of Convergence to Equilibrium and Łojasiewicz-Type Estimates , 2017 .