Relative Variational Intrinsic Control

In the absence of external rewards, agents can still learn useful behaviors by identifying and mastering a set of diverse skills within their environment. Existing skill learning methods use mutual information objectives to incentivize each skill to be diverse and distinguishable from the rest. However, if care is not taken to constrain the ways in which the skills are diverse, trivially diverse skill sets can arise. To ensure useful skill diversity, we propose a novel skill learning objective, Relative Variational Intrinsic Control (RVIC), which incentivizes learning skills that are distinguishable in how they change the agent's relationship to its environment. The resulting set of skills tiles the space of affordances available to the agent. We qualitatively analyze skill behaviors on multiple environments and show how RVIC skills are more useful than skills discovered by existing methods when used in hierarchical reinforcement learning.

[1]  David Warde-Farley,et al.  Fast Task Inference with Variational Intrinsic Successor Features , 2019, ICLR.

[2]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[3]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[4]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[5]  David Barber,et al.  Information Maximization in Noisy Channels : A Variational Approach , 2003, NIPS.

[6]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[7]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[8]  Doina Precup,et al.  What can I do here? A Theory of Affordances in Reinforcement Learning , 2020, ICML.

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Shimon Whiteson,et al.  Protecting against evaluation overfitting in empirical reinforcement learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[11]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[12]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[13]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  三嶋 博之 The theory of affordances , 2008 .

[16]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[17]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[18]  Stefan Wermter,et al.  Improving reinforcement learning with interactive feedback and affordances , 2014, 4th International Conference on Development and Learning and on Epigenetic Robotics.

[19]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[20]  David Warde-Farley,et al.  Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[21]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[22]  Pieter Abbeel,et al.  Variational Option Discovery Algorithms , 2018, ArXiv.

[23]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[24]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[25]  Doina Precup,et al.  When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.

[26]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..