Learning and exploration in action-perception loops

Discovering the structure underlying observed data is a recurring problem in machine learning with important applications in neuroscience. It is also a primary function of the brain. When data can be actively collected in the context of a closed action-perception loop, behavior becomes a critical determinant of learning efficiency. Psychologists studying exploration and curiosity in humans and animals have long argued that learning itself is a primary motivator of behavior. However, the theoretical basis of learning-driven behavior is not well understood. Previous computational studies of behavior have largely focused on the control problem of maximizing acquisition of rewards and have treated learning the structure of data as a secondary objective. Here, we study exploration in the absence of external reward feedback. Instead, we take the quality of an agent's learned internal model to be the primary objective. In a simple probabilistic framework, we derive a Bayesian estimate for the amount of information about the environment an agent can expect to receive by taking an action, a measure we term the predicted information gain (PIG). We develop exploration strategies that approximately maximize PIG. One strategy based on value-iteration consistently learns faster than previously developed reward-free exploration strategies across a diverse range of environments. Psychologists believe the evolutionary advantage of learning-driven exploration lies in the generalized utility of an accurate internal model. Consistent with this hypothesis, we demonstrate that agents which learn more efficiently during exploration are later better able to accomplish a range of goal-directed tasks. We will conclude by discussing how our work elucidates the explorative behaviors of animals and humans, its relationship to other computational models of behavior, and its potential application to experimental design, such as in closed-loop neurophysiology studies.

[1]  D. Berlyne NOVELTY AND CURIOSITY AS DETERMINANTS OF EXPLORATORY BEHAVIOUR1 , 1950 .

[2]  Montgomery Kc Exploratory behavior as a function of similarity of stimulus situations. , 1953 .

[3]  K. Montgomery Exploratory behavior as a function of similarity of stimulus situations. , 1953, Journal of comparative and physiological psychology.

[4]  N. Miller,et al.  Failure to find a learned drive based on hunger; evidence for learning motivated by exploration. , 1954, Journal of comparative and physiological psychology.

[5]  M. Glanzer Curiosity, exploratory drive, and stimulus satiation. , 1958, Psychological bulletin.

[6]  E. Mayr Cause and effect in biology. , 1961, Science.

[7]  W. H. Kane On Cause and Effect in Biology. , 1962, Science.

[8]  R. Macarthur,et al.  On Optimal Use of a Patchy Environment , 1966, The American Naturalist.

[9]  D. Berlyne Curiosity and exploration. , 1966, Science.

[10]  C. Hutt,et al.  Specific and diversive exploration. , 1970, Advances in child development and behavior.

[11]  E. Pfaffelhuber Learning and information theory. , 1972, The International journal of neuroscience.

[12]  C. Hutt,et al.  Predictions from Play , 1972, Nature.

[13]  Sylvia Weir,et al.  Action perception , 1974 .

[14]  E. Charnov Optimal foraging, the marginal value theorem. , 1976, Theoretical population biology.

[15]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[16]  J. Archer,et al.  Exploration in animals and humans , 1983 .

[17]  C. Holahan Cognition and Environment: Functioning in an Uncertain World. , 1984 .

[18]  J. Klayman,et al.  Confirmation, Disconfirmation, and Informa-tion in Hypothesis Testing , 1987 .

[19]  MITSUO SATO,et al.  Learning control of finite Markov chains with an explicit trade-off between estimation and control , 1988, IEEE Trans. Syst. Man Cybern..

[20]  M. J. Renner Learning During Exploration: The Role of Behavioral Topography During Exploration in Determining Subsequent Adaptive Behavior in the Sprague-Dawley Rat (Rattus norvegicus) , 1988 .

[21]  P. Rochat Object manipulation and exploration in 2- to 5-month-old infants , 1989 .

[22]  Young,et al.  Inferring statistical complexity. , 1989, Physical review letters.

[23]  M. J. Renner Neglected aspects of exploratory and investigatory behavior , 1990, Psychobiology.

[24]  Andrew G. Barto,et al.  On the Computational Economics of Reinforcement Learning , 1991 .

[25]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[26]  B. Boycott,et al.  Functional architecture of the mammalian retina. , 1991, Physiological reviews.

[27]  Nick Chater,et al.  A rational analysis of the selection task as optimal data selection. , 1994 .

[28]  G. Loewenstein The psychology of curiosity: A review and reinterpretation. , 1994 .

[29]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[30]  S. Hochreiter,et al.  REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[31]  Stuart J. Russell Rationality and Intelligence , 1995, IJCAI.

[32]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[33]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[34]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[35]  P. Silvia Interest and Interests: The Psychology of Constructive Capriciousness , 2001 .

[36]  P. Rochat Object Manipulation and Exploration in 2-to 5-Month-Old Infants , 2001 .

[37]  A. Noë,et al.  A sensorimotor account of vision and visual consciousness. , 2001, The Behavioral and brain sciences.

[38]  J. Crutchfield,et al.  Regularities unseen, randomness observed: levels of entropy convergence. , 2001, Chaos.

[39]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[40]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[41]  P. Silvia What is interesting? Exploring the appraisal structure of interest. , 2005, Emotion.

[42]  Jonathan D. Nelson Finding useful questions: on Bayesian diagnosticity, probability, impact, and information gain. , 2005, Psychological review.

[43]  R. Guillery Anatomical pathways that link perception and action. , 2005, Progress in brain research.

[44]  P. Silvia Exploring the Psychology of Interest , 2006 .

[45]  M. Kawato,et al.  Efficient reinforcement learning: computational theories, neuroscience and robotics , 2007, Current Opinion in Neurobiology.

[46]  Hugo Gimbert,et al.  Pure Stationary Optimal Strategies in Markov Decision Processes , 2007, STACS.

[47]  Massimo Vergassola,et al.  ‘Infotaxis’ as a strategy for searching without gradients , 2007, Nature.

[48]  J. Michael Herrmann,et al.  Gain-based Exploration: From Multi-armed Bandits to Partially Observable Environments , 2007, Third International Conference on Natural Computation (ICNC 2007).

[49]  Wojciech Pisula PLAY AND EXPLORATION IN ANIMALS - A COMPARATIVE ANALYSIS , 2008 .

[50]  Ralf Der,et al.  Predictive information and explorative behavior of autonomous robots , 2008 .

[51]  Wojciech Pisula Curiosity and Information Seeking in Animal and Human Behavior , 2009 .

[52]  Susanne Still,et al.  Information-theoretic approach to interactive learning , 2007, 0709.1948.

[53]  Lihong Li,et al.  A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.

[54]  Karl J. Friston The free-energy principle: a rough guide to the brain? , 2009, Trends in Cognitive Sciences.

[55]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[56]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[57]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[58]  Pierre Baldi,et al.  Of bits and wows: A Bayesian theory of surprise with applications to attention , 2010, Neural Networks.

[59]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[60]  Benjamin S. Lankow,et al.  Toward an Integrated Approach to Perception and Action: Conference Report and Future Directions , 2011, Front. Syst. Neurosci..

[61]  R. Guillery,et al.  Branched thalamic afferents: What are the messages that they relay to the cortex? , 2011, Brain Research Reviews.