Information-Seeking, Learning and the Marginal Value Theorem: A Normative Approach to Adaptive Exploration

Daily life often makes us decide between two goals: maximizing immediate rewards (exploitation) and learning about the environment so as to improve our options for future rewards (exploration). An adaptive organism therefore should place value on information independent of immediate reward, and affective states may signal such value (e.g., curiosity vs. boredom: Hill & Perkins, 1985; Eastwood et al. 2012). This tradeoff has been well studied in “bandit” tasks involving choice among a fixed number of options, but is equally pertinent in situations such as foraging, hunting, or job search, where one encounters a series of new options sequentially. Here, we augment the classic serial foraging scenario to more explicitly reward the development of knowledge. We develop a formal model that quantifies the value of information in this setting and how it should impact decision making, paralleling the treatment of reward by the marginal value theorem (MVT) in the foraging literature. We then present the results of an experiment designed to provide an initial test of this model, and discuss the implications of this information-foraging framework on boredom and task disengagement.

[1]  Ola Olsson,et al.  The foraging benefits of information and the penalty of ignorance , 2006 .

[2]  P. Taylor,et al.  Test of optimal sampling by foraging great tits , 1978 .

[3]  Jonathan D. Cohen,et al.  Humans use directed and random exploration to solve the explore-exploit dilemma. , 2014, Journal of experimental psychology. General.

[4]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[5]  Robert L. Goldstone,et al.  Human foraging behavior in a virtual environment , 2004, Psychonomic bulletin & review.

[6]  H. Fowler,et al.  Satiation and Curiosity: Constructs for a Drive and Incentive-Motivational Theory of Exploration1 , 1967 .

[7]  C. Gallistel The organization of learning , 1990 .

[8]  Stanley J. Rosenschein,et al.  From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior , 1996 .

[9]  A. B. Hill,et al.  Towards a model of boredom. , 1985, British journal of psychology.

[10]  Peter Bossaerts,et al.  Do not Bet on the Unknown Versus Try to Find Out More: Estimation Uncertainty and “Unexpected Uncertainty” Both Modulate Exploration , 2012, Front. Neurosci..

[11]  Edward L. Deci,et al.  Intrinsic Motivation and Self-Determination in Human Behavior , 1975, Perspectives in Social Psychology.

[12]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[13]  E. Charnov Optimal foraging, the marginal value theorem. , 1976, Theoretical population biology.

[14]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[15]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[16]  Andrew G. Barto,et al.  An intrinsic reward mechanism for efficient exploration , 2006, ICML.

[17]  Carlos Diuk,et al.  Hierarchical Learning Induces Two Simultaneous, But Separable, Prediction Errors in Human Basal Ganglia , 2013, The Journal of Neuroscience.

[18]  Robert C. Wilson,et al.  An Approximately Bayesian Delta-Rule Model Explains the Dynamics of Belief Updating in a Changing Environment , 2010, The Journal of Neuroscience.

[19]  Angela L. Duckworth,et al.  An opportunity cost model of subjective effort and task performance. , 2013, The Behavioral and brain sciences.

[20]  M. Fenske,et al.  The Unengaged Mind , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[21]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[22]  John M. Pearson,et al.  Neurons in Posterior Cingulate Cortex Signal Exploratory Decisions in a Dynamic Multioption Choice Task , 2009, Current Biology.