Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[3] Gang Niu,et al. Analysis and Improvement of Policy Gradient Estimation , 2011, NIPS.
[4] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[5] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[7] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[9] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[10] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[11] Stuart Barber,et al. All of Statistics: a Concise Course in Statistical Inference , 2005 .
[12] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[13] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[14] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[15] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[16] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[17] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[18] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.
[19] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[20] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[21] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[22] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[23] Frank Sehnke,et al. Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.
[24] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[25] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[26] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[27] R Bellman,et al. DYNAMIC PROGRAMMING AND LAGRANGE MULTIPLIERS. , 1956, Proceedings of the National Academy of Sciences of the United States of America.
[28] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[29] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[30] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[31] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[32] Michael Kearns,et al. Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.
[33] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[34] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[35] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[36] H. Jeffreys. An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.
[37] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[38] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[39] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[40] A V Herz,et al. Neural codes: firing rates and beyond. , 1997, Proceedings of the National Academy of Sciences of the United States of America.
[41] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[42] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.