论文信息 - Learning the opportunity cost of time in a patch-foraging task

Learning the opportunity cost of time in a patch-foraging task

Although most decision research concerns choice between simultaneously presented options, in many situations options are encountered serially, and the decision is whether to exploit an option or search for a better one. Such problems have a rich history in animal foraging, but we know little about the psychological processes involved. In particular, it is unknown whether learning in these problems is supported by the well-studied neurocomputational mechanisms involved in more conventional tasks. We investigated how humans learn in a foraging task, which requires deciding whether to harvest a depleting resource or switch to a replenished one. The optimal choice (given by the marginal value theorem; MVT) requires comparing the immediate return from harvesting to the opportunity cost of time, which is given by the long-run average reward. In two experiments, we varied opportunity cost across blocks, and subjects adjusted their behavior to blockwise changes in environmental characteristics. We examined how subjects learned their choice strategies by comparing choice adjustments to a learning rule suggested by the MVT (in which the opportunity cost threshold is estimated as an average over previous rewards) and to the predominant incremental-learning theory in neuroscience, temporal-difference learning (TD). Trial-by-trial decisions were explained better by the MVT threshold-learning rule. These findings expand on the foraging literature, which has focused on steady-state behavior, by elucidating a computational mechanism for learning in switching tasks that is distinct from those used in traditional tasks, and suggest connections to research on average reward rates in other domains of neuroscience.

N. Daw | S. Constantino

[1] D. Bernoulli. Exposition of a New Theory on the Measurement of Risk , 1954 .

[2] R J HERRNSTEIN,et al. Relative and absolute strength of response as a function of frequency of reinforcement. , 1961, Journal of the experimental analysis of behavior.

[3] David Elkind,et al. Learning: An Introduction , 1968 .

[4] J. McCall. Economics of Information and Job Search , 1970 .

[5] W. Baum. Choice in Free-Ranging Wild Pigeons , 1974, Science.

[6] E. Charnov. Optimal foraging, the marginal value theorem. , 1976, Theoretical population biology.

[7] J. T. Erichsen,et al. Optimal prey selection in the great tit (Parus major) , 1977, Animal Behaviour.

[8] J. Ollason. Learning to forage--optimally? , 1980, Theoretical population biology.

[9] A. Kacelnik. CENTRAL PLACE FORAGING IN STARLINGS (STURNUS-VULGARIS) .1. PATCH RESIDENCE TIME , 1984 .

[10] Clayton M. Hodges. Bumble Bee Foraging: Energetic Consequences of Using a Threshold Departure Rule , 1985 .

[11] A. Houston,et al. Optimal foraging and learning , 1985 .

[12] J. Salamone. Dopaminergic involvement in activational aspects of motivation: Effects of haloperidol on schedule-induced activity, feeding, and foraging in rats , 1988, Psychobiology.

[13] J. Krebs,et al. INDIVIDUAL DECISIONS AND THE DISTRIBUTION OF PREDATORS IN A PATCHY ENVIRONMENT , 1988 .

[14] C. Watkins. Learning from delayed rewards , 1989 .

[15] J. Krebs,et al. Starlings exploiting patches: the effect of recent experience on foraging decisions , 1990, Animal Behaviour.

[16] R. Herrnstein. Experiments on Stable Suboptimality in Individual Behavior , 1991 .

[17] J. Krebs,et al. Learning and Foraging: Individuals, Groups, and Populations , 1992, The American Naturalist.

[18] E. Smith,et al. Evolutionary Ecology and Human Behavior , 1992 .

[19] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[20] Joel L. Davis,et al. A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[21] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[22] Joel L. Davis,et al. Adaptive Critics and the Basal Ganglia , 1995 .

[23] P. Dayan,et al. A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[24] E. A. Jacobs,et al. Humans' choices in situations of time-based diminishing returns: effects of fixed-interval duration and progressive-interval step size. , 1996, Journal of the experimental analysis of behavior.

[25] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.

[26] A. Kacelnik. Normative and descriptive models of decision making: time discounting and risk sensitivity. , 2007, Ciba Foundation symposium.

[27] Kenji Doya,et al. What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[28] C. Gallistel,et al. Time, rate, and conditioning. , 2000, Psychological review.

[29] David S. Touretzky,et al. Long-Term Reward Prediction in TD Models of the Dopamine System , 2002, Neural Computation.

[30] James Bernard Murphy,et al. Tug of War. , 2003 .

[31] W. Newsome,et al. Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[32] Michael J. Frank,et al. By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[33] D. Stephens,et al. Impulsiveness without discounting: the ecological rationality hypothesis , 2004, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[34] D. Barraclough,et al. Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[35] Jonathan D. Cohen,et al. An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. , 2005, Annual review of neuroscience.

[36] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[37] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[38] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[39] Peter Dayan,et al. How fast to work: Response vigor, motivation and tonic dopamine , 2005, NIPS.

[40] R. Hertwig,et al. The priority heuristic: making choices without trade-offs. , 2006, Psychological review.

[41] P. Dayan,et al. Cortical substrates for exploratory decisions in humans , 2006, Nature.

[42] J. O'Doherty,et al. The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans , 2006, The Journal of Neuroscience.

[43] P. Dayan,et al. Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[44] T. Robbins,et al. A role for mesencephalic dopamine in activation: commentary on Berridge (2006) , 2007, Psychopharmacology.

[45] Timothy E. J. Behrens,et al. Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[46] Sabrina M. Tom,et al. The Neural Basis of Loss Aversion in Decision-Making Under Risk , 2007, Science.

[47] H. Seo,et al. Dynamic signals related to choices and outcomes in the dorsolateral prefrontal cortex. , 2007, Cerebral cortex.

[48] Colin Camerer,et al. A framework for studying the neurobiology of value-based decision making , 2008, Nature Reviews Neuroscience.

[49] Aaron P. Wagner,et al. Spatial grouping in behaviourally solitary striped hyaenas, Hyaena hyaena , 2008, Animal Behaviour.

[50] A. Rustichini. Neuroeconomics: Formal models of decision making and cognitive neuroscience , 2008 .

[51] Wolfgang Hauber,et al. Dopamine D1 and D2 receptors in the nucleus accumbens core and shell mediate Pavlovian-instrumental transfer. , 2008, Learning & memory.

[52] P. Todd,et al. Patch leaving in humans: can a generalist adapt its rules to dispersal of items across patches? , 2008, Animal Behaviour.

[53] Colin Camerer,et al. Neuroeconomics: decision making and the brain , 2008 .

[54] Roshan Cools,et al. Role of Dopamine in the Motivational and Cognitive Control of Behavior , 2008, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[55] M. Walton,et al. Comparing the role of the anterior cingulate cortex and 6‐hydroxydopamine nucleus accumbens lesions on operant effort‐based decision making , 2009, The European journal of neuroscience.

[56] Karl J. Friston,et al. Bayesian model selection for group studies , 2009, NeuroImage.

[57] K. Doya,et al. Validation of Decision-Making Models and Analysis of Decision Variables in the Rat Basal Ganglia , 2009, The Journal of Neuroscience.

[58] G. McNickle,et al. Plant root growth and the marginal value theorem , 2009, Proceedings of the National Academy of Sciences.

[59] Mark S. Gilzenrat,et al. Pupil diameter tracks changes in control state predicted by the adaptive gain theory of locus coeruleus function , 2010, Cognitive, affective & behavioral neuroscience.

[60] M. Walton,et al. Dissociable cost and benefit encoding of future rewards by mesolimbic dopamine , 2009, Nature Neuroscience.

[61] Daeyeol Lee,et al. Beyond working memory: the role of persistent activity in decision making , 2010, Trends in Cognitive Sciences.

[62] Alex Kacelnik,et al. Darwin’s “tug-of-war” vs. starlings’ “horse-racing”: how adaptations for sequential encounters drive simultaneous choice , 2010, Behavioral Ecology and Sociobiology.

[63] Thomas T. Hills,et al. Information Search in Decisions From Experience , 2010, Psychological science.

[64] A. Rangel,et al. Visual fixations and the computation and comparison of value in simple choice. , 2010, Nature neuroscience.

[65] N. Daw,et al. Serotonin and Dopamine: Unifying Affective, Activational, and Decision Functions , 2011, Neuropsychopharmacology.

[66] Peter Dayan,et al. Vigor in the Face of Fluctuating Rates of Reward: An Experimental Examination , 2011, Journal of Cognitive Neuroscience.

[67] H. Seo,et al. A reservoir of time constants for memory traces in cortical neurons , 2011, Nature Neuroscience.

[68] Alex Kacelnik,et al. Rational Choice, Context Dependence, and the Value of Information in European Starlings (Sturnus vulgaris) , 2011, Science.

[69] P. Dayan,et al. Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[70] Colin Camerer,et al. Transformation of stimulus value signals into motor commands during simple choice , 2011, Proceedings of the National Academy of Sciences.

[71] John M. Pearson,et al. Neuronal basis of sequential foraging decisions in a patchy environment , 2011, Nature Neuroscience.

[72] Amir Dezfouli,et al. Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[73] J. O'Doherty,et al. Annals of the New York Academy of Sciences Contributions of the Ventromedial Prefrontal Cortex to Goal-directed Action Selection , 2022 .

[74] N. Daw,et al. Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction , 2011, The Journal of Neuroscience.

[75] M. Rushworth,et al. Valuation and decision-making in frontal cortex: one or many serial or parallel systems? , 2012, Current Opinion in Neurobiology.

[76] Timothy E. J. Behrens,et al. Neural Mechanisms of Foraging , 2012, Science.

[77] Edward Vul,et al. A Bayesian Optimal Foraging Model of Human Visual Search , 2012, Psychological science.

[78] Thomas T. Hills,et al. Optimal foraging in semantic memory. , 2012, Psychological review.

[79] Andrew M. Wikenheiser,et al. Subjective costs drive overly patient foraging strategies in rats on an intertemporal foraging task , 2013, Proceedings of the National Academy of Sciences.

[80] Angela L. Duckworth,et al. An opportunity cost model of subjective effort and task performance. , 2013, The Behavioral and brain sciences.

[81] M. Botvinick,et al. The intrinsic cost of cognitive control. , 2013, The Behavioral and brain sciences.

[82] Jonathan D. Cohen,et al. The Expected Value of Control: An Integrative Theory of Anterior Cingulate Cortex Function , 2013, Neuron.

[83] Timothy Edward John Behrens,et al. Ventromedial Prefrontal and Anterior Cingulate Cortex Adopt Choice and Default Reference Frames during Sequential Multi-Alternative Choice , 2013, The Journal of Neuroscience.

[84] Ulrik R Beierholm,et al. Dopamine Modulates Reward-Related Vigor , 2013, Neuropsychopharmacology.

[85] Peter Dayan,et al. Optimal indolence: a normative microscopic approach to work and leisure , 2014, Journal of The Royal Society Interface.

[86] Mark A. Straccia,et al. Anterior Cingulate Engagement in a Foraging Context Reflects Choice Difficulty, Not Foraging Value , 2014, Nature Neuroscience.

[87] R. K. Simpson. Nature Neuroscience , 2022 .