Policy Optimization with Stochastic Mirror Descent
暂无分享,去创建一个
Gang Pan | Qian Zheng | Pengfei Li | Long Yang | Gang Zheng | Jianhang Huang | Yu Zhang
[1] Pengfei Li,et al. CUP: A Conservative Update Policy Algorithm for Safe Reinforcement Learning , 2022, ArXiv.
[2] Heng Huang,et al. Bregman Gradient Policy Optimization , 2021, ICLR.
[3] Gang Pan,et al. Sample Complexity of Policy Gradient Finding Second-Order Stationary Points , 2020, AAAI.
[4] M. Ghavamzadeh,et al. Mirror Descent Policy Optimization , 2020, ICLR.
[5] Quanquan Gu,et al. Sample Efficient Policy Gradient Methods with Recursive Variance Reduction , 2019, ICLR.
[6] Shie Mannor,et al. Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs , 2019, AAAI.
[7] Ching-An Cheng,et al. Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods , 2019, CoRL.
[8] S. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[9] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[10] Quanquan Gu,et al. An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient , 2019, UAI.
[11] Alejandro Ribeiro,et al. Hessian Aided Policy Gradient , 2019, ICML.
[12] Byron Boots,et al. Predictor-Corrector Policy Optimization , 2018, ICML.
[13] Yuren Zhou,et al. Policy Optimization via Stochastic Recursive Gradient Algorithm , 2018 .
[14] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.
[15] Hongzi Mao,et al. Variance Reduction for Reinforcement Learning in Input-Driven Environments , 2018, ICLR.
[16] Pengfei Li,et al. Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[17] Marcello Restelli,et al. Stochastic Variance-Reduced Policy Gradient , 2018, ICML.
[18] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[19] Alexandre M. Bayen,et al. Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.
[20] Gang Pan,et al. A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning , 2018, IJCAI.
[21] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[22] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[23] Prateek Jain,et al. Non-convex Optimization for Machine Learning , 2017, Found. Trends Mach. Learn..
[24] David Duvenaud,et al. Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.
[25] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[26] Jian Peng,et al. Stochastic Variance Reduction for Policy Gradient Estimation , 2017, ArXiv.
[27] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.
[28] Lihong Li,et al. Stochastic Variance Reduction Methods for Policy Evaluation , 2017, ICML.
[29] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[30] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[31] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[32] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[33] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[34] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[35] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[36] Michael I. Jordan,et al. Trust Region Policy Optimization , 2015, ICML.
[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[38] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[39] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[40] Saeed Ghadimi,et al. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.
[41] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.
[42] Jan Peters,et al. Relative Entropy Policy Search , 2010, AAAI.
[43] Dimitri P. Bertsekas,et al. Convex Optimization Theory , 2009 .
[44] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[45] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[46] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[47] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[48] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[49] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[50] Qian Zheng,et al. Learning with Generated Teammates to Achieve Type-Free Ad-Hoc Teamwork , 2021, IJCAI.
[51] Hao Liu,et al. Action-dependent Control Variates for Policy Optimization via Stein Identity , 2018, ICLR.
[52] Ke Tang,et al. Stochastic Composite Mirror Descent: Optimal Bounds with High Probabilities , 2018, NeurIPS.
[53] G. Crooks. On Measures of Entropy and Information , 2015 .
[54] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[55] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.
[56] C. Stein. Approximate computation of expectations , 1986 .
[57] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .