论文信息 - Unsupervised Control Through Non-Parametric Discriminative Rewards

Unsupervised Control Through Non-Parametric Discriminative Rewards

Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of reinforcement learning research. We present an unsupervised learning algorithm to train agents to achieve perceptually-specified goals using only a stream of observations and actions. Our agent simultaneously learns a goal-conditioned policy and a goal achievement reward function that measures how similar a state is to the goal state. This dual optimization leads to a co-operative game, giving rise to a learned reward function that reflects similarity in controllable aspects of the environment instead of distance in the space of observations. We demonstrate the efficacy of our agent to learn, in an unsupervised manner, to reach a diverse set of goals on three domains -- Atari, the DeepMind Control Suite and DeepMind Lab.

[1] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[3] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4] Blockin Blockin,et al. Quick Training of Probabilistic Neural Nets by Importance Sampling , 2003 .

[5] David Barber,et al. Information Maximization in Noisy Channels : A Variational Approach , 2003, NIPS.

[6] Geoffrey E. Hinton,et al. Inferring Motor Programs from Images of Handwritten Digits , 2005, NIPS.

[7] Hossein Mobahi,et al. Deep learning from temporal coherence in video , 2009, ICML '09.

[8] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[9] Ben Taskar,et al. Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[10] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[12] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[13] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[14] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[15] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[16] Marlos C. Machado,et al. Learning Purposeful Behaviour in the Absence of Rewards , 2016, ArXiv.

[17] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[18] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[19] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[20] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[21] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[22] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.