Cortical substrates for exploratory decisions in humans

Decision making in an uncertain environment poses a conflict between the opposing demands of gathering and exploiting information. In a classic illustration of this ‘exploration–exploitation’ dilemma, a gambler choosing between multiple slot machines balances the desire to select what seems, on the basis of accumulated experience, the richest option, against the desire to choose a less familiar option that might turn out more advantageous (and thereby provide information for improving future decisions). Far from representing idle curiosity, such exploration is often critical for organisms to discover how best to harvest resources such as food and water. In appetitive choice, substantial experimental evidence, underpinned by computational reinforcement learning (RL) theory, indicates that a dopaminergic, striatal and medial prefrontal network mediates learning to exploit. In contrast, although exploration has been well studied from both theoretical and ethological perspectives, its neural substrates are much less clear. Here we show, in a gambling task, that human subjects' choices can be characterized by a computationally well-regarded strategy for addressing the explore/exploit dilemma. Furthermore, using this characterization to classify decisions as exploratory or exploitative, we employ functional magnetic resonance imaging to show that the frontopolar cortex and intraparietal sulcus are preferentially active during exploratory decisions. In contrast, regions of striatum and ventromedial prefrontal cortex exhibit activity characteristic of an involvement in value-based exploitative decision making. The results suggest a model of action selection under uncertainty that involves switching between exploratory and exploitative behavioural modes, and provide a computationally precise characterization of the contribution of key decision-related brain systems to each of these functions.

[1]  J. Gani,et al.  Progress in statistics , 1975 .

[2]  E. Charnov Optimal foraging, the marginal value theorem. , 1976, Theoretical population biology.

[3]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[4]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[5]  A. Owen Cognitive planning in humans: Neuropsychological, neuroanatomical and neuropharmacological perspectives , 1997, Progress in Neurobiology.

[6]  Michael L. Platt,et al.  Neural correlates of decision variables in parietal cortex , 1999, Nature.

[7]  J. Cohen,et al.  The role of locus coeruleus in the regulation of cognitive performance. , 1999, Science.

[8]  Brian Knutson,et al.  FMRI Visualization of Brain Activity during a Monetary Incentive Delay Task , 2000, NeuroImage.

[9]  T. Shallice,et al.  The cognitive and neuroanatomical correlates of multitasking , 2000, Neuropsychologia.

[10]  L. Nystrom,et al.  Tracking the hemodynamic responses to reward and punishment in the striatum. , 2000, Journal of neurophysiology.

[11]  E. Miller,et al.  An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[12]  E. Rolls,et al.  Abstract reward and punishment representations in the human orbitofrontal cortex , 2001, Nature Neuroscience.

[13]  T. Braver,et al.  The Role of Frontopolar Cortex in Subgoal Processing during Working Memory , 2002, NeuroImage.

[14]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[15]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[16]  J. O'Doherty,et al.  Encoding Predictive Reward Value in Human Amygdala and Orbitofrontal Cortex , 2003, Science.

[17]  E. Koechlin,et al.  The Architecture of Cognitive Control in the Human Prefrontal Cortex , 2003, Science.

[18]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[19]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[20]  Saori C. Tanaka,et al.  Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[21]  P. Glimcher,et al.  Activity in Posterior Parietal Cortex Is Correlated with the Relative Subjective Desirability of Action , 2004, Neuron.

[22]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[23]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[24]  Samuel M. McClure,et al.  Separate Neural Systems Value Immediate and Delayed Monetary Rewards , 2004, Science.

[25]  A. Owen,et al.  Anterior prefrontal cortex: insights into function from anatomy and neuroimaging , 2004, Nature Reviews Neuroscience.

[26]  J. O'Doherty,et al.  Reward representations and reward-related learning in the human brain: insights from neuroimaging , 2004, Current Opinion in Neurobiology.

[27]  G. Fink,et al.  REVIEW: The functional organization of the intraparietal sulcus in humans and monkeys , 2005, Journal of anatomy.

[28]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[29]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.