Empowerment for continuous agent—environment systems

This article develops generalizations of empowerment to continuous states. Empowerment is a recently introduced information-theoretic quantity motivated by hypotheses about the efficiency of the sensorimotor loop in biological organisms, but also from considerations stemming from curiosity-driven learning. Empowerment measures, for agent—environment systems with stochastic transitions, how much influence an agent has on its environment, but only that influence that can be sensed by the agent sensors. It is an information-theoretic generalization of joint controllability (influence on environment) and observability (measurement by sensors) of the environment by the agent, both controllability and observability being usually defined in control theory as the dimensionality of the control/observation spaces. Earlier work has shown that empowerment has various interesting and relevant properties, for example, it allows us to identify salient states using only the dynamics, and it can act as intrinsic reward without requiring an external reward. However, in this previous work empowerment was limited to the case of small-scale and discrete domains and furthermore state transition probabilities were assumed to be known. The goal of this article is to extend empowerment to the significantly more important and relevant case of continuous vector-valued state spaces and initially unknown state transition probabilities. The continuous state space is addressed by Monte Carlo approximation; the unknown transitions are addressed by model learning and prediction for which we apply Gaussian processes regression with iterated forecasting. In a number of well-known continuous control tasks we examine the dynamics induced by empowerment and include an application to exploration and online model learning.

[1]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[2]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[3]  Mark W. Spong,et al.  The swing up control problem for the Acrobot , 1995 .

[4]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  Ralf Der,et al.  Homeokinesis - A new principle to back up evolution with learning , 1999 .

[7]  M. Mohammadian Computational Intelligence for Modelling, Control and Automation '99 , 1999 .

[8]  R. Der Self-organized robot behavior from theprinciple of homeokinesisRalf , 2000 .

[9]  Paulo Murilo Castro de Oliveira,et al.  Why do evolutionary systems stick to the edge of chaos , 2001, Theory in Biosciences.

[10]  P. Deoliveira Why do Evolutionary Systems Stick to the Edge of Chaos , 2001 .

[11]  Ralf Der Self-organized acquisition of situated behaviors , 2001 .

[12]  C. Rasmussen,et al.  Gaussian Process Priors with Uncertain Inputs - Application to Multiple-Step Ahead Time Series Forecasting , 2002, NIPS.

[13]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[14]  Luc Steels,et al.  The Autotelic Principle , 2003, Embodied Artificial Intelligence.

[15]  Pierre-Yves Oudeyer,et al.  Maximizing Learning Progress: An Internal Reward System for Development , 2003, Embodied Artificial Intelligence.

[16]  K. A. Connors The Free Energy , 2003 .

[17]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[18]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[19]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[20]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[21]  Chrystopher L. Nehaniv,et al.  All Else Being Equal Be Empowered , 2005, ECAL.

[22]  M. Lungarella,et al.  Information Self-Structuring: Key Principle for Learning and Development , 2005, Proceedings. The 4nd International Conference on Development and Learning, 2005..

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Olaf Sporns,et al.  Evolving Coordinated Behavior by Maximizing Information Structure , 2006 .

[25]  Karl J. Friston,et al.  A free energy principle for the brain , 2006, Journal of Physiology-Paris.

[26]  Olaf Sporns,et al.  Mapping Information Flow in Sensorimotor Networks , 2006, PLoS Comput. Biol..

[27]  M. Prokopenko,et al.  Evolving Spatiotemporal Coordination in a Modular Robotic System , 2006, SAB.

[28]  Olaf Sporns,et al.  Methods for quantifying the informational structure of sensory and motor data , 2007, Neuroinformatics.

[29]  J. Weston,et al.  Approximation Methods for Gaussian Process Regression , 2007 .

[30]  Chrystopher L. Nehaniv,et al.  On Preferred States of Agents - how Global Structure is reflected in Local Structure , 2008, ALIFE.

[31]  Chrystopher L. Nehaniv,et al.  Keep Your Options Open: An Information-Based Driving Principle for Sensorimotor Systems , 2008, PloS one.

[32]  Ralf Der,et al.  Predictive information and explorative behavior of autonomous robots , 2008 .

[33]  Susanne Still,et al.  Information-theoretic approach to interactive learning , 2007, 0709.1948.

[34]  Chrystopher L. Nehaniv,et al.  Impoverished Empowerment: 'Meaningful' Action Sequence Generation through Bandwidth Limitation , 2009, ECAL.

[35]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[36]  Karl J. Friston The free-energy principle: a rough guide to the brain? , 2009, Trends in Cognitive Sciences.

[37]  Ralf Der,et al.  Higher Coordination With Less Control—A Result of Information Maximization in the Sensorimotor Loop , 2009, Adapt. Behav..

[38]  Daniel Polani,et al.  Information Theory of Decisions and Actions , 2011 .