论文信息 - Continuous Value Iteration (CVI) Reinforcement Learning and Imaginary Experience Replay (IER) For Learning Multi-Goal, Continuous Action and State Space Controllers

Continuous Value Iteration (CVI) Reinforcement Learning and Imaginary Experience Replay (IER) For Learning Multi-Goal, Continuous Action and State Space Controllers

This paper presents a novel model-free Reinforcement Learning algorithm for learning behavior in continuous action, state, and goal spaces. The algorithm approximates optimal value functions using non-parametric estimators. It is able to efficiently learn to reach multiple arbitrary goals in deterministic and nondeterministic environments. To improve generalization in the goal space, we propose a novel sample augmentation technique. Using these methods, robots learn faster and overall better controllers. We benchmark the proposed algorithms using simulation and a real-world voltage controlled robot that learns to maneuver in a non-observable Cartesian task space.

Michael Spranger | Andreas Gerken | Michael Spranger | Andreas Gerken

[1] N. Altman. An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[2] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[3] Martin A. Riedmiller,et al. Neural Reinforcement Learning Controllers for a Real Robot Application , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[4] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[5] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[6] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7] Hussein A. Abbass,et al. Multi-Task Deep Reinforcement Learning for Continuous Action Control , 2017, IJCAI.

[8] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[9] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[10] Peter Dayan,et al. Structure in the Space of Value Functions , 2002, Machine Learning.

[11] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[13] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[14] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[15] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[16] Y. Demiris,et al. From motor babbling to hierarchical learning by imitation: a robot developmental pathway , 2005 .

[17] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[18] Matthias Rolf,et al. Goal babbling for an efficient bootstrapping of inverse models in high dimensions , 2012 .

[19] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[20] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[21] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.

[22] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[23] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[24] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[25] Pierre-Yves Oudeyer,et al. Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[26] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.