暂无分享,去创建一个
[1] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[2] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[3] Marcello Restelli,et al. Stochastic Variance-Reduced Policy Gradient , 2018, ICML.
[4] Quanquan Gu,et al. An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient , 2019, UAI.
[5] Shie Mannor,et al. Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs , 2020, AAAI.
[6] Aaron Sidford,et al. Efficiently Solving MDPs with Stochastic Mirror Descent , 2020, ICML.
[7] Niao He,et al. On the Convergence Rate of Stochastic Mirror Descent for Nonsmooth Nonconvex Optimization , 2018, 1806.04781.
[8] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[9] Mohammad Ghavamzadeh,et al. Mirror Descent Policy Optimization , 2020, ArXiv.
[10] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[11] Francesco Orabona,et al. Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Amnon Shashua,et al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.
[14] Yu Zhang,et al. Policy Optimization with Stochastic Mirror Descent , 2019, ArXiv.
[15] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[16] Guanghui Lan. Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes , 2021, ArXiv.
[17] Brendan O'Donoghue,et al. Sample Efficient Reinforcement Learning with REINFORCE , 2020, AAAI.
[18] Hao Zhu,et al. Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies , 2019, SIAM J. Control. Optim..
[19] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[20] Wotao Yin,et al. An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods , 2022, NeurIPS.
[21] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[22] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[23] Feihu Huang,et al. SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients , 2021, ArXiv.
[24] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.
[25] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[26] Quanquan Gu,et al. Sample Efficient Policy Gradient Methods with Recursive Variance Reduction , 2019, ICLR.
[27] Feihu Huang,et al. Momentum-Based Policy Gradient Methods , 2020, ICML.
[28] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[29] Yuejie Chi,et al. Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence , 2021, ArXiv.
[30] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[31] Yuxi Li,et al. Deep Reinforcement Learning: An Overview , 2017, ArXiv.
[32] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[33] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[34] Yuxin Chen,et al. Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization , 2020, Oper. Res..
[35] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[36] Tong Zhang,et al. Divergence-Augmented Policy Optimization , 2019, NeurIPS.
[37] Alejandro Ribeiro,et al. Hessian Aided Policy Gradient , 2019, ICML.
[38] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[39] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[40] Yishay Mansour,et al. Learning Bounds for Importance Weighting , 2010, NIPS.
[41] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[42] Saeed Ghadimi,et al. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.