论文信息 - Experimental Evidence that Empowerment May Drive Exploration in Sparse-Reward Environments

Experimental Evidence that Empowerment May Drive Exploration in Sparse-Reward Environments

Reinforcement Learning (RL) is known to be often unsuccessful in environments with sparse extrinsic rewards. A possible countermeasure is to endow RL agents with an intrinsic reward function, or ‘intrinsic motivation’, which rewards the agent based on certain features of the current sensor state. An intrinsic reward function based on the principle of empowerment assigns rewards proportional to the amount of control the agent has over its own sensors. We implemented a variation on a recently proposed intrinsically motivated agent, which we refer to as the ‘curious’ agent, and an empowerment-inspired agent. The former leverages sensor state encoding with a variational autoencoder, while the latter predicts the next sensor state via a variational information bottleneck. We compared the performance of both agents to that of an advantage actor-critic baseline in four sparse reward grid worlds. Both the empowerment agent and its curious competitor seem to benefit to similar extents from their intrinsic rewards. This provides some experimental support to the conjecture that empowerment can be used to drive exploration.

[1] Yoshua Bengio,et al. The Journey is the Reward: Unsupervised Learning of Influential Trajectories , 2019, ArXiv.

[2] Alexander A. Alemi,et al. Deep Variational Information Bottleneck , 2017, ICLR.

[3] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.

[4] Christoph Salge,et al. Empowerment - an Introduction , 2013, ArXiv.

[5] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[6] Pierre Baldi,et al. Bayesian surprise attracts human attention , 2005, Vision Research.

[7] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[8] Pieter Abbeel,et al. Learning Efficient Representation for Intrinsic Motivation , 2019, ArXiv.

[9] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[10] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[11] Sergey Levine,et al. Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[12] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[13] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.

[14] Laurent Orseau,et al. Universal Knowledge-Seeking Agents for Stochastic Environments , 2013, ALT.

[15] Chrystopher L. Nehaniv,et al. Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[16] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[17] Ryota Kanai,et al. A unified strategy for implementing curiosity and empowerment driven reinforcement learning , 2018, ArXiv.