论文信息 - Learning and exploration in action-perception loops

Learning and exploration in action-perception loops

Discovering the structure underlying observed data is a recurring problem in machine learning with important applications in neuroscience. It is also a primary function of the brain. When data can be actively collected in the context of a closed action-perception loop, behavior becomes a critical determinant of learning efficiency. Psychologists studying exploration and curiosity in humans and animals have long argued that learning itself is a primary motivator of behavior. However, the theoretical basis of learning-driven behavior is not well understood. Previous computational studies of behavior have largely focused on the control problem of maximizing acquisition of rewards and have treated learning the structure of data as a secondary objective. Here, we study exploration in the absence of external reward feedback. Instead, we take the quality of an agent's learned internal model to be the primary objective. In a simple probabilistic framework, we derive a Bayesian estimate for the amount of information about the environment an agent can expect to receive by taking an action, a measure we term the predicted information gain (PIG). We develop exploration strategies that approximately maximize PIG. One strategy based on value-iteration consistently learns faster than previously developed reward-free exploration strategies across a diverse range of environments. Psychologists believe the evolutionary advantage of learning-driven exploration lies in the generalized utility of an accurate internal model. Consistent with this hypothesis, we demonstrate that agents which learn more efficiently during exploration are later better able to accomplish a range of goal-directed tasks. We will conclude by discussing how our work elucidates the explorative behaviors of animals and humans, its relationship to other computational models of behavior, and its potential application to experimental design, such as in closed-loop neurophysiology studies.

Friedrich T. Sommer | Daniel Y. Little | F. Sommer

[1] D. Berlyne. NOVELTY AND CURIOSITY AS DETERMINANTS OF EXPLORATORY BEHAVIOUR1 , 1950 .

[2] Montgomery Kc. Exploratory behavior as a function of similarity of stimulus situations. , 1953 .

[3] K. Montgomery. Exploratory behavior as a function of similarity of stimulus situations. , 1953, Journal of comparative and physiological psychology.

[4] N. Miller,et al. Failure to find a learned drive based on hunger; evidence for learning motivated by exploration. , 1954, Journal of comparative and physiological psychology.

[5] M. Glanzer. Curiosity, exploratory drive, and stimulus satiation. , 1958, Psychological bulletin.

[6] E. Mayr. Cause and effect in biology. , 1961, Science.

[7] W. H. Kane. On Cause and Effect in Biology. , 1962, Science.

[8] R. Macarthur,et al. On Optimal Use of a Patchy Environment , 1966, The American Naturalist.

[9] D. Berlyne. Curiosity and exploration. , 1966, Science.

[10] C. Hutt,et al. Specific and diversive exploration. , 1970, Advances in child development and behavior.

[11] E. Pfaffelhuber. Learning and information theory. , 1972, The International journal of neuroscience.

[12] C. Hutt,et al. Predictions from Play , 1972, Nature.

[13] Sylvia Weir,et al. Action perception , 1974 .

[14] E. Charnov. Optimal foraging, the marginal value theorem. , 1976, Theoretical population biology.

[15] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[16] J. Archer,et al. Exploration in animals and humans , 1983 .

[17] C. Holahan. Cognition and Environment: Functioning in an Uncertain World. , 1984 .

[18] J. Klayman,et al. Confirmation, Disconfirmation, and Informa-tion in Hypothesis Testing , 1987 .

[19] MITSUO SATO,et al. Learning control of finite Markov chains with an explicit trade-off between estimation and control , 1988, IEEE Trans. Syst. Man Cybern..

[20] M. J. Renner. Learning During Exploration: The Role of Behavioral Topography During Exploration in Determining Subsequent Adaptive Behavior in the Sprague-Dawley Rat (Rattus norvegicus) , 1988 .

[21] P. Rochat. Object manipulation and exploration in 2- to 5-month-old infants , 1989 .

[22] Young,et al. Inferring statistical complexity. , 1989, Physical review letters.

[23] M. J. Renner. Neglected aspects of exploratory and investigatory behavior , 1990, Psychobiology.

[24] Andrew G. Barto,et al. On the Computational Economics of Reinforcement Learning , 1991 .

[25] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[26] B. Boycott,et al. Functional architecture of the mammalian retina. , 1991, Physiological reviews.

[27] Nick Chater,et al. A rational analysis of the selection task as optimal data selection. , 1994 .

[28] G. Loewenstein. The psychology of curiosity: A review and reinterpretation. , 1994 .

[29] David J. C. MacKay,et al. A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[30] S. Hochreiter,et al. REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[31] Stuart J. Russell. Rationality and Intelligence , 1995, IJCAI.

[32] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[33] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[34] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.

[35] P. Silvia. Interest and Interests: The Psychology of Constructive Capriciousness , 2001 .

[36] P. Rochat. Object Manipulation and Exploration in 2-to 5-Month-Old Infants , 2001 .

[37] A. Noë,et al. A sensorimotor account of vision and visual consciousness. , 2001, The Behavioral and brain sciences.

[38] J. Crutchfield,et al. Regularities unseen, randomness observed: levels of entropy convergence. , 2001, Chaos.

[39] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[40] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[41] P. Silvia. What is interesting? Exploring the appraisal structure of interest. , 2005, Emotion.

[42] Jonathan D. Nelson. Finding useful questions: on Bayesian diagnosticity, probability, impact, and information gain. , 2005, Psychological review.

[43] R. Guillery. Anatomical pathways that link perception and action. , 2005, Progress in brain research.

[44] P. Silvia. Exploring the Psychology of Interest , 2006 .

[45] M. Kawato,et al. Efficient reinforcement learning: computational theories, neuroscience and robotics , 2007, Current Opinion in Neurobiology.

[46] Hugo Gimbert,et al. Pure Stationary Optimal Strategies in Markov Decision Processes , 2007, STACS.

[47] Massimo Vergassola,et al. ‘Infotaxis’ as a strategy for searching without gradients , 2007, Nature.

[48] J. Michael Herrmann,et al. Gain-based Exploration: From Multi-armed Bandits to Partially Observable Environments , 2007, Third International Conference on Natural Computation (ICNC 2007).

[49] Wojciech Pisula. PLAY AND EXPLORATION IN ANIMALS - A COMPARATIVE ANALYSIS , 2008 .

[50] Ralf Der,et al. Predictive information and explorative behavior of autonomous robots , 2008 .

[51] Wojciech Pisula. Curiosity and Information Seeking in Animal and Human Behavior , 2009 .

[52] Susanne Still,et al. Information-theoretic approach to interactive learning , 2007, 0709.1948.

[53] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.

[54] Karl J. Friston. The free-energy principle: a rough guide to the brain? , 2009, Trends in Cognitive Sciences.

[55] Pierre Baldi,et al. Bayesian surprise attracts human attention , 2005, Vision Research.

[56] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[57] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[58] Pierre Baldi,et al. Of bits and wows: A Bayesian theory of surprise with applications to attention , 2010, Neural Networks.

[59] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[60] Benjamin S. Lankow,et al. Toward an Integrated Approach to Perception and Action: Conference Report and Future Directions , 2011, Front. Syst. Neurosci..

[61] R. Guillery,et al. Branched thalamic afferents: What are the messages that they relay to the cortex? , 2011, Brain Research Reviews.