An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods
暂无分享,去创建一个
Wotao Yin | Tamer Basar | Kaiqing Zhang | Yanli Liu | T. Başar | W. Yin | K. Zhang | Yanli Liu
[1] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.
[2] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[3] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[4] Artin,et al. SARAH : A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017 .
[5] Qi Cai,et al. Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy , 2019, NeurIPS.
[6] Jian Peng,et al. Stochastic Variance Reduction for Policy Gradient Estimation , 2017, ArXiv.
[7] D. Shanno. Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .
[8] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[9] Hao Zhu,et al. Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies , 2019, SIAM J. Control. Optim..
[10] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..
[11] Wotao Yin,et al. Acceleration of SVRG and Katyusha X by Inexact Preconditioning , 2019, ICML.
[12] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.
[13] Shiqian Ma,et al. Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization , 2014, SIAM J. Optim..
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Alexandre M. Bayen,et al. Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.
[16] Quanquan Gu,et al. An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient , 2019, UAI.
[17] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[18] Zhe Wang,et al. Improving Sample Complexity Bounds for Actor-Critic Algorithms , 2020, ArXiv.
[19] Zhe Wang,et al. Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms , 2020, ArXiv.
[20] Weitong Zhang,et al. A Finite Time Analysis of Two Time-Scale Actor Critic Methods , 2020, NeurIPS.
[21] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[22] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[23] Luca Bascetta,et al. Adaptive Step-Size for Policy Gradient Methods , 2013, NIPS.
[24] Shie Mannor,et al. Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs , 2020, AAAI.
[25] D. Goldfarb. A family of variable-metric methods derived by variational means , 1970 .
[26] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.
[27] C. G. Broyden. The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .
[28] Ji Liu,et al. Stochastic Recursive Momentum for Policy Gradient Methods , 2020, ArXiv.
[29] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[30] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[31] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[32] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[33] Jalaj Bhandari,et al. Global Optimality Guarantees For Policy Gradient Methods , 2019, ArXiv.
[34] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[35] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[36] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[37] Zhaoran Wang,et al. A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic , 2020, ArXiv.
[38] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[39] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[40] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[41] Michael I. Jordan,et al. A Linearly-Convergent Stochastic L-BFGS Algorithm , 2015, AISTATS.
[42] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
[43] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[44] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[45] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[46] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[47] Yishay Mansour,et al. Learning Bounds for Importance Weighting , 2010, NIPS.
[48] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[49] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[50] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[51] Zhaoran Wang,et al. Neural Policy Gradient Methods: Global Optimality and Rates of Convergence , 2019, ICLR.
[52] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[53] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[54] Quanquan Gu,et al. Sample Efficient Policy Gradient Methods with Recursive Variance Reduction , 2020, ICLR.
[55] R. Fletcher,et al. A New Approach to Variable Metric Algorithms , 1970, Comput. J..
[56] Marcello Restelli,et al. Stochastic Variance-Reduced Policy Gradient , 2018, ICML.
[57] Anirban DasGupta,et al. The Exponential Family and Statistical Applications , 2011 .
[58] Marten van Dijk,et al. A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning , 2020, AISTATS.
[59] Feihu Huang,et al. Momentum-Based Policy Gradient Methods , 2020, ICML.
[60] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[61] Robert M. Gower,et al. Stochastic Block BFGS: Squeezing More Curvature out of Data , 2016, ICML.
[62] R.J. Williams,et al. Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.