Modeling the Value of Strategic Actions in the Superior Colliculus

In learning models of strategic game play, an agent constructs a valuation (action value) over possible future choices as a function of past actions and rewards. Choices are then stochastic functions of these action values. Our goal is to uncover a neural signal that correlates with the action value posited by behavioral learning models. We measured activity from neurons in the superior colliculus (SC), a midbrain region involved in planning saccadic eye movements, while monkeys performed two saccade tasks. In the strategic task, monkeys competed against a computer in a saccade version of the mixed-strategy game ”matching-pennies”. In the instructed task, saccades were elicited through explicit instruction rather than free choices. In both tasks neuronal activity and behavior were shaped by past actions and rewards with more recent events exerting a larger influence. Further, SC activity predicted upcoming choices during the strategic task and upcoming reaction times during the instructed task. Finally, we found that neuronal activity in both tasks correlated with an established learning model, the Experience Weighted Attraction model of action valuation (Camerer and Ho, 1999). Collectively, our results provide evidence that action values hypothesized by learning models are represented in the motor planning regions of the brain in a manner that could be used to select strategic actions.

[1]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[2]  A. Cooper,et al.  Predictive Reward Signal of Dopamine Neurons , 2011 .

[3]  P. Glimcher,et al.  Activity in Posterior Parietal Cortex Is Correlated with the Relative Subjective Desirability of Action , 2004, Neuron.

[4]  J. Mayhew,et al.  How Visual Stimuli Activate Dopaminergic Neurons at Short Latency , 2005, Science.

[5]  D. Munoz,et al.  Neuronal Activity in Monkey Superior Colliculus Related to the Initiation of Saccadic Eye Movements , 1997, The Journal of Neuroscience.

[6]  Dilip Mookherjee,et al.  Learning and Decision Costs in Experimental Constant Sum Games , 1997 .

[7]  H. Seo,et al.  Dynamic signals related to choices and outcomes in the dorsolateral prefrontal cortex. , 2007, Cerebral cortex.

[8]  R. Coppola,et al.  Physiological characteristics of capacity constraints in working memory as revealed by functional MRI. , 1999, Cerebral cortex.

[9]  P. Glimcher,et al.  Value Representations in the Primate Striatum during Matching Behavior , 2008, Neuron.

[10]  D. Runkle,et al.  An experimental study of information and mixed-strategy play in the three-person matching-pennies game , 2000 .

[11]  David L. Sparks,et al.  Movement selection in advance of action in the superior colliculus , 1992, Nature.

[12]  P. Glimcher,et al.  Action and Outcome Encoding in the Primate Caudate Nucleus , 2007, The Journal of Neuroscience.

[13]  D. Barraclough,et al.  Reinforcement learning and decision making in monkeys during a competitive game. , 2004, Brain research. Cognitive brain research.

[14]  Eric van Damme,et al.  Non-Cooperative Games , 2000 .

[15]  J. Harsanyi Games with randomly disturbed payoffs: A new rationale for mixed-strategy equilibrium points , 1973 .

[16]  H. Seo,et al.  Lateral Intraparietal Cortex and Reinforcement Learning during a Mixed-Strategy Game , 2009, Journal of Neuroscience.

[17]  John R. Anderson,et al.  Working Memory: Activation Limitations on Retrieval , 1996, Cognitive Psychology.

[18]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[19]  Xin Wang,et al.  Individual Differences in EWA Learning with Partial Payoff Information , 2008 .

[20]  H. Seo,et al.  Temporal Filtering of Reward Signals in the Dorsal Anterior Cingulate Cortex during a Mixed-Strategy Game , 2007, The Journal of Neuroscience.

[21]  A. Rapoport,et al.  Generation of random series in two-person strictly competitive games , 1992 .

[22]  H. Seo,et al.  Cortical mechanisms for reinforcement learning in competitive games , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[23]  D. Munoz,et al.  Competitive Integration of Visual and Preparatory Signals in the Superior Colliculus during Saccadic Programming , 2007, The Journal of Neuroscience.

[24]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[25]  R. Klein,et al.  A Model of Saccade Initiation Based on the Competitive Integration of Exogenous and Endogenous Signals in the Superior Colliculus , 2001, Journal of Cognitive Neuroscience.

[26]  B. O'Neill Nonmetric test of the minimax theory of two-person zerosum games. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[27]  W. Schultz Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioural ecology , 2004, Current Opinion in Neurobiology.

[28]  C. Padoa-Schioppa,et al.  Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.

[29]  D. Munoz,et al.  Lateral inhibitory interactions in the intermediate layers of the monkey superior colliculus. , 1998, Journal of neurophysiology.

[30]  D. Munoz,et al.  t Immediate Neural Plasticity Shapes Motor Performance , 2000, The Journal of Neuroscience.

[31]  Peter Redgrave,et al.  A direct projection from superior colliculus to substantia nigra for detecting salient visual events , 2003, Nature Neuroscience.

[32]  W. Wolf,et al.  Occurrence of human express saccades depends on stimulus uncertainty and stimulus sequence , 2004, Experimental Brain Research.

[33]  Leonidas Spiliopoulos,et al.  Humans versus computer algorithms in repeated mixed strategy games , 2008 .

[34]  Christopher D. Carello,et al.  Manipulating Intent Evidence for a Causal Role of the Superior Colliculus in Target Selection , 2004, Neuron.

[35]  D. Robinson Eye movements evoked by collicular stimulation in the alert monkey. , 1972, Vision research.

[36]  Jeffrey M. Woodbridge Econometric Analysis of Cross Section and Panel Data , 2002 .

[37]  R. Wurtz,et al.  Interaction of the frontal eye field and superior colliculus for saccade generation. , 2001, Journal of neurophysiology.

[38]  J. Neumann,et al.  The Theory of Games and Economic Behaviour , 1944 .

[39]  J. Ochs Games with Unique, Mixed Strategy Equilibria: An Experimental Study , 1995 .

[40]  Dilip Mookherjee,et al.  Learning behavior in an experimental matching pennies game , 1994 .

[41]  M. Dorris,et al.  Role of the Superior Colliculus in Choosing Mixed-Strategy Saccades , 2009, The Journal of Neuroscience.

[42]  J. Neumann,et al.  Theory of Games and Economic Behavior. , 1945 .

[43]  Michael L. Platt,et al.  Neural correlates of decision variables in parietal cortex , 1999, Nature.

[44]  A. Rapoport,et al.  Mixed strategies in strictly competitive games: A further test of the minimax hypothesis , 1992 .

[45]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[46]  P. Glimcher,et al.  MEASURING BELIEFS AND REWARDS: A NEUROECONOMIC APPROACH. , 2010, The quarterly journal of economics.

[47]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[48]  E. Thorndike “Animal Intelligence” , 1898, Nature.

[49]  J. Wallis,et al.  Dynamic Encoding of Responses and Outcomes by Neurons in Medial Prefrontal Cortex , 2009, The Journal of Neuroscience.

[50]  Kevin McCabe,et al.  Neural signature of fictive learning signals in a sequential investment task , 2007, Proceedings of the National Academy of Sciences.

[51]  M. Walton,et al.  Action sets and decisions in the medial frontal cortex , 2004, Trends in Cognitive Sciences.

[52]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[53]  Colin Camerer,et al.  Experienced-Weighted Attraction Learning in Normal Form Games , 2007 .

[54]  Timothy E. J. Behrens,et al.  Optimal decision making and the anterior cingulate cortex , 2006, Nature Neuroscience.

[55]  Markus Ullsperger,et al.  Adaptive Coding of Action Values in the Human Rostral Cingulate Zone , 2009, The Journal of Neuroscience.

[56]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[57]  Teck-Hua Ho,et al.  Self-tuning experience weighted attraction learning in games , 2007, J. Econ. Theory.

[58]  D. Barraclough,et al.  Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[59]  Robert W. Rosenthal,et al.  Testing the Minimax Hypothesis: A Re-examination of O'Neill's Game Experiment , 1990 .

[60]  S. Highstein,et al.  The anatomy and physiology of primate neurons that control rapid eye movements. , 1994, Annual review of neuroscience.

[61]  Robert M. McPeek,et al.  Deficits in saccade target selection after inactivation of superior colliculus , 2004, Nature Neuroscience.