Dynamic signals related to choices and outcomes in the dorsolateral prefrontal cortex.

Although economic theories based on utility maximization account for a range of choice behaviors, utilities must be estimated through experience. Dynamics of this learning process may account for certain discrepancies between the predictions of economic theories and real choice behaviors of humans and other animals. To understand the neural mechanisms responsible for such adaptive decision making, we trained rhesus monkeys to play a simulated matching pennies game. Small but systematic deviations of the animal's behavior from the optimal strategy were consistent with the predictions of reinforcement learning theory. In addition, individual neurons in the dorsolateral prefrontal cortex (DLPFC) encoded 3 different types of signals that can potentially influence the animal's future choices. First, activity modulated by the animal's previous choices might provide the eligibility trace that can be used to attribute a particular outcome to its causative action. Second, activity related to the animal's rewards in the previous trials might be used to compute an average reward rate. Finally, activity of some neurons was modulated by the computer's choices in the previous trials and may reflect the process of updating the value functions. These results suggest that the DLPFC might be an important node in the cortical network of decision making.

[1]  O. L. Tinklepaugh An experimental study of representative factors in monkeys. , 1928 .

[2]  L. Crespi Quantitative variation of incentive and performance in the white rat. , 1942 .

[3]  J. Neumann,et al.  Theory of Games and Economic Behavior. , 1945 .

[4]  H. Helson Adaptation-level as a basis for a quantitative theory of frames of reference. , 1948, Psychological review.

[5]  K. M. Michels Response latency as a function of the amount of reinforcement , 1957 .

[6]  A. Tversky,et al.  Prospect theory: analysis of decision under risk , 1979 .

[7]  C. Flaherty Incentive contrast: A review of behavioral changes following shifts in reward , 1982 .

[8]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[9]  P. Goldman-Rakic Cellular basis of working memory , 1995, Neuron.

[10]  A. P. Georgopoulos,et al.  Variability and Correlated Noise in the Discharge of Neurons in Motor and Parietal Areas of the Primate Cortex , 1998, The Journal of Neuroscience.

[11]  E. Miller,et al.  An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[12]  W. Bair,et al.  Correlated Firing in Macaque Visual Area MT: Time Scales and Relationship to Behavior , 2001, The Journal of Neuroscience.

[13]  Y. Pawitan In all likelihood : statistical modelling and inference using likelihood , 2002 .

[14]  D. Barraclough,et al.  Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[15]  C. Bruce,et al.  The effect of attentive fixation on eye movements evoked by electrical stimulation of the frontal eye fields , 2004, Experimental Brain Research.

[16]  C. Evinger,et al.  Different forms of blinks and their two-stage control , 2004, Experimental Brain Research.

[17]  D. Barraclough,et al.  Reinforcement learning and decision making in monkeys during a competitive game. , 2004, Brain research. Cognitive brain research.

[18]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[19]  Jonathan D. Cohen,et al.  An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. , 2005, Annual review of neuroscience.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  SRIDHAR MAHADEVAN,et al.  Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.

[22]  H. Seung,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 581–617 NUMBER 3(NOVEMBER) LINEAR-NONLINEAR-POISSON MODELS OF PRIMATE CHOICE DYNAMICS , 2022 .

[23]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[24]  D. Barraclough,et al.  Learning and decision making in monkeys during a rock-paper-scissors game. , 2005, Brain research. Cognitive brain research.

[25]  J. Tanji,et al.  Representation of immediate and final behavioral goals in the monkey prefrontal cortex during an instructed delay period. , 2005, Cerebral cortex.

[26]  Xiao-Jing Wang,et al.  A Biophysically Based Neural Model of Matching Law Behavior: Melioration by Stochastic Synapses , 2006, The Journal of Neuroscience.

[27]  Philip Holmes,et al.  Rapid decision threshold modulation by reward rate in a neural network , 2006, Neural Networks.

[28]  Daeyeol Lee Neural basis of quasi-rational decision making , 2006, Current Opinion in Neurobiology.

[29]  Xiao-Jing Wang,et al.  Neural mechanism for stochastic behaviour during a competitive game , 2006, Neural Networks.

[30]  Daeyeol Lee,et al.  Activity in prefrontal cortex during dynamic selection of action sequences , 2006, Nature Neuroscience.

[31]  W. Schultz Behavioral theories and the neurophysiology of reward. , 2006, Annual review of psychology.

[32]  K. Doya,et al.  The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[33]  Daeyeol Lee,et al.  Effects of reward expectancy on sequential eye movements in monkeys , 2006, Neural Networks.

[34]  A. Tversky,et al.  Prospect theory: an analysis of decision under risk — Source link , 2007 .