Discrete Action On-Policy Learning with Action-Value Critic
暂无分享,去创建一个
Yunhao Tang | Mingzhang Yin | Yuguang Yue | Mingyuan Yin | Mingzhang Yin | Yunhao Tang | Yuguang Yue | Mingyuan Yin
[1] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[2] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[3] Patrick MacAlpine,et al. UT Austin Villa: RoboCup 2016 3D Simulation League Competition and Technical Challenges Champions , 2015, Robot Soccer World Cup.
[4] Yujing Hu,et al. Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application , 2018, KDD.
[5] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[6] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[7] Tapani Raiko,et al. Techniques for Learning Binary Stochastic Feedforward Neural Networks , 2014, ICLR.
[8] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[9] Jascha Sohl-Dickstein,et al. REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.
[10] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[11] Yunhao Tang,et al. Discretizing Continuous Action Space for On-Policy Optimization , 2019, AAAI.
[12] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[13] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[14] Mingyuan Zhou,et al. ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks , 2018, ICLR.
[15] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[16] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.
[17] Dengyong Zhou,et al. Action-depedent Control Variates for Policy Optimization via Stein's Identity , 2017 .
[18] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[19] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[20] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[21] David Duvenaud,et al. Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.
[22] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[23] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[24] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[25] M. de Rijke,et al. Reinforcement Learning to Rank , 2019, WSDM.
[26] Patrick MacAlpine,et al. UT Austin Villa: RoboCup 2015 3D Simulation League Competition and Technical Challenges Champions , 2015, RoboCup.
[27] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[28] Richard Evans,et al. Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.
[29] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[30] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[31] Yang Liu,et al. Stein Variational Policy Gradient , 2017, UAI.
[32] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[33] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[34] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[35] Alexandre M. Bayen,et al. Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.
[36] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[37] Mingyuan Zhou,et al. ARSM: Augment-REINFORCE-Swap-Merge Estimator for Gradient Backpropagation Through Categorical Variables , 2019, ICML.
[38] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[39] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[40] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.