Efficient Empowerment Estimation for Unsupervised Stabilization

Intrinsically motivated artificial agents learn advantageous behavior without externally-provided rewards. Previously, it was shown that maximizing mutual information between agent actuators and future states, known as the empowerment principle, enables unsupervised stabilization of dynamical systems at upright positions, which is a prototypical intrinsically motivated behavior for upright standing and walking. This follows from the coincidence between the objective of stabilization and the objective of empowerment. Unfortunately, sample-based estimation of this kind of mutual information is challenging. Recently, various variational lower bounds (VLBs) on empowerment have been proposed as solutions; however, they are often biased, unstable in training, and have high sample complexity. In this work, we propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel, which allows us to efficiently calculate an unbiased estimator of empowerment by convex optimization. We demonstrate our solution for sample-based unsupervised stabilization on different dynamical control systems and show the advantages of our method by comparing it to the existing VLB approaches. Specifically, we show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images. Consequently, our method opens a path to wider and easier adoption of empowerment for various applications.

[1]  Pieter Abbeel,et al.  Variational Option Discovery Algorithms , 2018, ArXiv.

[2]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[3]  Jordi Torres,et al.  Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills , 2020, ICML.

[4]  Daniel Polani,et al.  AvE: Assistance via Empowerment , 2020, NeurIPS.

[5]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[6]  Simon Colton,et al.  Intrinsically motivated general companion NPCs via Coupled Empowerment Maximisation , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[7]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[8]  Peter Stone,et al.  Empowerment for continuous agent—environment systems , 2011, Adapt. Behav..

[9]  Chrystopher L. Nehaniv,et al.  All Else Being Equal Be Empowered , 2005, ECAL.

[10]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[11]  Christoph Salge,et al.  Empowerment As Replacement for the Three Laws of Robotics , 2017, Front. Robot. AI.

[12]  Christoph Salge,et al.  Approximation of Empowerment in the continuous Domain , 2013, Adv. Complex Syst..

[13]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[14]  Daniel Polani,et al.  Control Capacity of Partially Observable Dynamic Systems in Continuous Time , 2017, ArXiv.

[15]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[16]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[17]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[18]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[19]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[20]  Giulio Colavolpe,et al.  Elements of Information Theory , 2013 .

[21]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[22]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[23]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[24]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[26]  Christoph Salge,et al.  Empowerment - an Introduction , 2013, ArXiv.

[27]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[28]  Daniel Polani,et al.  Information Theory of Decisions and Actions , 2011 .

[29]  Michael Tschannen,et al.  On Mutual Information Maximization for Representation Learning , 2019, ICLR.

[30]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[31]  Emanuel Todorov,et al.  Optimal Control Theory , 2006 .

[32]  A D Wissner-Gross,et al.  Causal entropic forces. , 2013, Physical review letters.