A unified strategy for implementing curiosity and empowerment driven reinforcement learning

Although there are many approaches to implement intrinsically motivated artificial agents, the combined usage of multiple intrinsic drives remains still a relatively unexplored research area. Specifically, we hypothesize that a mechanism capable of quantifying and controlling the evolution of the information flow between the agent and the environment could be the fundamental component for implementing a higher degree of autonomy into artificial intelligent agents. This paper propose a unified strategy for implementing two semantically orthogonal intrinsic motivations: curiosity and empowerment. Curiosity reward informs the agent about the relevance of a recent agent action, whereas empowerment is implemented as the opposite information flow from the agent to the environment that quantifies the agent's potential of controlling its own future. We show that an additional homeostatic drive is derived from the curiosity reward, which generalizes and enhances the information gain of a classical curious/heterostatic reinforcement learning agent. We show how a shared internal model by curiosity and empowerment facilitates a more efficient training of the empowerment function. Finally, we discuss future directions for further leveraging the interplay between these two intrinsic rewards.

[1]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[2]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[3]  Gerhard Kramer,et al.  Directed information for channels with feedback , 1998 .

[4]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[5]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[6]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Barbara L. Grabowski,et al.  Stimulating Intrinsic Motivation for Problem Solving Using Goal-Oriented Contexts and Peer Group Composition , 2006 .

[9]  Ana Paiva,et al.  Emotion-Based Intrinsic Motivation for Reinforcement Learning Agents , 2011, ACII.

[10]  Peter Stone,et al.  Empowerment for continuous agent—environment systems , 2011, Adapt. Behav..

[11]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[12]  Pierre-Yves Oudeyer,et al.  Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , 2012, NIPS.

[13]  Jasper Snoek,et al.  Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[14]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[15]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[16]  Keyan Zahedi,et al.  Information Theoretically Aided Reinforcement Learning for Embodied Agents , 2016, ArXiv.

[17]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[18]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[19]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[21]  Naftali Tishby,et al.  A Unified Bellman Equation for Causal Information and Value in Markov Decision Processes , 2017, ArXiv.

[22]  Patrick van der Smagt,et al.  Unsupervised Real-Time Control Through Variational Empowerment , 2017, ISRR.