Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
暂无分享,去创建一个
Elman Mansimov | Roger B. Grosse | Jimmy Ba | Shun Liao | Yuhuai Wu | Jimmy Ba | Yuhuai Wu | Elman Mansimov | Shu Liao | R. Grosse | Shun Liao
[1] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[2] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[3] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[4] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[5] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[6] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[7] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[8] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[9] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[10] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[11] James Martens,et al. New perspectives on the natural gradient method , 2014, ArXiv.
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[15] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[16] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[17] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[18] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[19] Roger B. Grosse,et al. A Kronecker-factored approximate Fisher matrix for convolution layers , 2016, ICML.
[20] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[21] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[22] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[23] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[24] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[25] Roger B. Grosse,et al. Distributed Second-Order Optimization using Kronecker-Factored Approximations , 2016, ICLR.
[26] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[27] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[28] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[29] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[30] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.