Action-depedent Control Variates for Policy Optimization via Stein's Identity
暂无分享,去创建一个
Dengyong Zhou | Qiang Liu | Yihao Feng | Hao Liu | Yi Mao | Jian Peng | Hao Liu
[1] C. Stein. Approximate computation of expectations , 1986 .
[2] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[3] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[4] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[5] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[6] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[7] Louis H. Y. Chen,et al. An Introduction to Stein's Method , 2005 .
[8] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[9] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[10] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[11] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[12] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[13] N. Chopin,et al. Control functionals for Monte Carlo integration , 2014, 1410.2392.
[14] Lester W. Mackey,et al. Measuring Sample Quality with Stein's Method , 2015, NIPS.
[15] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[17] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[18] M. Girolami,et al. Control Functionals for Quasi-Monte Carlo Integration , 2015, AISTATS.
[19] Anima Anandkumar,et al. Provable Tensor Methods for Learning Mixtures of Generalized Linear Models , 2014, AISTATS.
[20] Qiang Liu,et al. A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.
[21] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[22] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.
[23] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[24] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[25] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[26] Arthur Gretton,et al. A Kernel Test of Goodness of Fit , 2016, ICML.
[27] Dilin Wang,et al. Learning to Draw Samples with Amortized Stein Variational Gradient Descent , 2017, UAI.
[28] Qiang Liu,et al. Black-box Importance Sampling , 2016, AISTATS.
[29] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[30] Philip S. Thomas,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines , 2017, ArXiv.
[31] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.
[32] M. Littman,et al. Mean Actor Critic , 2017, ArXiv.
[33] Jascha Sohl-Dickstein,et al. REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.
[34] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[35] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[36] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[37] David Duvenaud,et al. Sticking the Landing: An Asymptotically Zero-Variance Gradient Estimator for Variational Inference , 2017, ArXiv.
[38] Alexandre M. Bayen,et al. Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.
[39] David Duvenaud,et al. Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.