暂无分享,去创建一个
Quanquan Gu | Pan Xu | Felicia Gao | Quanquan Gu | Pan Xu | F. Gao
[1] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[2] A. Eigen-analysis. Stochastic Variance Reduction Methods for Policy Evaluation , 2017 .
[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[4] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.
[5] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[6] Luca Bascetta,et al. Adaptive Step-Size for Policy Gradient Methods , 2013, NIPS.
[7] Jun S. Liu,et al. Monte Carlo strategies in scientific computing , 2001 .
[8] A. Rényi. On Measures of Entropy and Information , 1961 .
[9] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[10] Michael I. Jordan,et al. Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.
[11] Jian Peng,et al. Stochastic Variance Reduction for Policy Gradient Estimation , 2017, ArXiv.
[12] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.
[13] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[15] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[16] Mark W. Schmidt,et al. StopWasting My Gradients: Practical SVRG , 2015, NIPS.
[17] Quanquan Gu,et al. Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.
[18] Amnon Shashua,et al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.
[19] Sergey Levine,et al. The Mirage of Action-Dependent Baselines in Reinforcement Learning , 2018, ICML.
[20] Marcello Restelli,et al. Stochastic Variance-Reduced Policy Gradient , 2018, ICML.
[21] Nolan Wagener,et al. Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[22] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[23] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[24] Yishay Mansour,et al. Learning Bounds for Importance Weighting , 2010, NIPS.
[25] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[26] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[27] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[28] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.
[29] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.
[30] Jian Li,et al. A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization , 2018, NeurIPS.
[31] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[32] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..
[33] Gang Niu,et al. Analysis and Improvement of Policy Gradient Estimation , 2011, NIPS.
[34] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[35] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.
[36] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[37] H. Robbins. A Stochastic Approximation Method , 1951 .
[38] Marcello Restelli,et al. Adaptive Batch Size for Safe Policy Gradients , 2017, NIPS.