Continuous Value Iteration (CVI) Reinforcement Learning and Imaginary Experience Replay (IER) For Learning Multi-Goal, Continuous Action and State Space Controllers
暂无分享,去创建一个
[1] N. Altman. An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .
[2] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[3] Martin A. Riedmiller,et al. Neural Reinforcement Learning Controllers for a Real Robot Application , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.
[4] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[5] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[6] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[7] Hussein A. Abbass,et al. Multi-Task Deep Reinforcement Learning for Continuous Action Control , 2017, IJCAI.
[8] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[9] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[10] Peter Dayan,et al. Structure in the Space of Value Functions , 2002, Machine Learning.
[11] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[13] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[14] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[15] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[16] Y. Demiris,et al. From motor babbling to hierarchical learning by imitation: a robot developmental pathway , 2005 .
[17] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[18] Matthias Rolf,et al. Goal babbling for an efficient bootstrapping of inverse models in high dimensions , 2012 .
[19] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[20] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[21] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[22] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[23] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[24] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[25] Pierre-Yves Oudeyer,et al. Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..
[26] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.