Stochastic Activation Actor Critic Methods
暂无分享,去创建一个
Max Welling | Herke van Hoof | Wenling Shang | Douwe van der Wal | M. Welling | H. V. Hoof | Wenling Shang | D. V. D. Wal
[1] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[2] Sergey Levine,et al. Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.
[3] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[4] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..
[5] Henryk Michalewski,et al. Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes , 2018, ISC.
[6] Max Welling,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS 2015.
[7] Emanuel Todorov,et al. General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.
[8] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[9] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[10] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents (Extended Abstract) , 2018, IJCAI.
[11] Kevin P. Murphy,et al. A Survey of POMDP Solution Techniques , 2000 .
[12] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[13] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[14] Honglak Lee,et al. Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units , 2016, ICML.
[15] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[16] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[17] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[18] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[19] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[20] Yuandong Tian,et al. Latent forward model for Real-time Strategy game planning with incomplete information , 2018 .
[21] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[22] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
[23] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[24] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[25] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[26] Nicholas Roy,et al. The Belief Roadmap: Efficient Planning in Linear POMDPs by Factoring the Covariance , 2007, ISRR.
[27] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[28] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[29] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[30] Demis Hassabis,et al. Neural Episodic Control , 2017, ICML.
[31] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[32] Max Welling,et al. Bayesian Compression for Deep Learning , 2017, NIPS.
[33] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[34] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[35] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[36] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.
[37] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[38] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[39] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..