Neurocomputational mechanisms of adaptive learning in social exchanges

Prior work on prosocial and self-serving behavior in human economic exchanges has shown that counterparts’ high social reputations bias striatal reward signals and elicit cooperation, even when such cooperation is disadvantageous. This phenomenon suggests that the human striatum is modulated by the other’s social value, which is insensitive to the individual’s own choices to cooperate or defect. We tested an alternative hypothesis that, when people learn from their interactions with others, they encode prediction error updates with respect to their own policy. Under this policy update account striatal signals would reflect positive prediction errors when the individual’s choices correctly anticipated not only the counterpart’s cooperation but also defection. We examined behavior in three samples using reinforcement learning and model-free analyses and performed an fMRI study of striatal learning signals. In order to uncover the dynamics of goal-directed learning, we introduced reversals in the counterpart’s behavior and provided counterfactual (would-be) feedback when the individual chose not to engage with the counterpart. Behavioral data and model-derived prediction error maps (in both whole-brain and a priori striatal region of interest analyses) supported the policy update model. Thus, as people continually adjust their rate of cooperation based on experience, their behavior and striatal learning signals reveal a self-centered instrumental process corresponding to reciprocal altruism.

[1]  Timothy E. J. Behrens,et al.  Counterfactual Choice and Learning in a Neural Network Centered on Human Lateral Frontopolar Cortex , 2011, PLoS biology.

[2]  Mark W. Woolrich,et al.  Advances in functional and structural MR image analysis and implementation as FSL , 2004, NeuroImage.

[3]  Stefano Palminteri,et al.  When are inter-individual brain-behavior correlations informative? , 2016, bioRxiv.

[4]  S. Quartz,et al.  Getting to Know You: Reputation and Trust in a Two-Person Economic Exchange , 2005, Science.

[5]  Alexis Roche,et al.  A Four-Dimensional Registration Algorithm With Application to Joint Correction of Motion and Slice Timing in fMRI , 2011, IEEE Transactions on Medical Imaging.

[6]  Karl J. Friston,et al.  Bayesian model selection for group studies — Revisited , 2014, NeuroImage.

[7]  Colin Camerer Behavioural studies of strategic thinking in games , 2003, Trends in Cognitive Sciences.

[8]  J. O'Doherty,et al.  Neural coding of reward-prediction error signals during classical conditioning with attractive faces. , 2007, Journal of neurophysiology.

[9]  R. Dolan,et al.  Ventral striatal prediction error signaling is associated with dopamine synthesis capacity and fluid intelligence , 2013, Human brain mapping.

[10]  M. Delgado,et al.  The social brain and reward: social information processing in the human striatum. , 2014, Wiley interdisciplinary reviews. Cognitive science.

[11]  P. Avesani,et al.  Reputational Priors Magnify Striatal Responses to Violations of Trust , 2013, The Journal of Neuroscience.

[12]  Mark W Woolrich,et al.  Associative learning of social value , 2008, Nature.

[13]  Luke Clark,et al.  Reward/Punishment reversal learning in older suicide attempters. , 2010, The American journal of psychiatry.

[14]  R. Dolan,et al.  Brain, emotion and decision making: the paradigmatic example of regret , 2007, Trends in Cognitive Sciences.

[15]  Serge A R B Rombouts,et al.  What motivates repayment? Neural correlates of reciprocity in the Trust Game. , 2009, Social cognitive and affective neuroscience.

[16]  R W Cox,et al.  AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. , 1996, Computers and biomedical research, an international journal.

[17]  J. Tanaka,et al.  The NimStim set of facial expressions: Judgments from untrained research participants , 2009, Psychiatry Research.

[18]  Christian Grillon,et al.  Stress increases aversive prediction error signal in the ventral striatum , 2013, Proceedings of the National Academy of Sciences.

[19]  A. Sirigu,et al.  The Involvement of the Orbitofrontal Cortex in the Experience of Regret , 2004, Science.

[20]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[21]  M. Delgado,et al.  Perceptions of moral character modulate the neural systems of reward during the trust game , 2005, Nature Neuroscience.

[22]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[23]  Bruce Fischl,et al.  Accurate and robust brain image alignment using boundary-based registration , 2009, NeuroImage.

[24]  S. Eickhoff,et al.  Reinforcement learning models and their neural correlates: An activation likelihood estimation meta-analysis , 2015, Cognitive, affective & behavioral neuroscience.

[25]  Lionel Rigoux,et al.  VBA: A Probabilistic Treatment of Nonlinear Models for Neurobiological and Behavioural Data , 2014, PLoS Comput. Biol..

[26]  Timothy Edward John Behrens,et al.  Separable Learning Systems in the Macaque Brain and the Role of Orbitofrontal Cortex in Contingent Learning , 2010, Neuron.

[27]  Ziv M. Williams,et al.  Neuronal Prediction of Opponent’s Behavior during Cooperative Social Interchange in Primates , 2015, Cell.

[28]  Luke J. Chang,et al.  Effects of Direct Social Experience on Trust Decisions and Neural Reward Circuitry , 2012, Front. Neurosci..

[29]  Robert C. Wilson,et al.  Is Model Fitting Necessary for Model-Based fMRI? , 2015, PLoS Comput. Biol..

[30]  Brian R. Tietz,et al.  Deciding Which Way to Go: How Do Insects Alter Movements to Negotiate Barriers? , 2012, Front. Neurosci..

[31]  Pearl H. Chiu,et al.  Smokers' brains compute, but ignore, a fictive error signal in a sequential investment task , 2008, Nature Neuroscience.

[32]  Raymond J. Dolan,et al.  A Role for the Striatum in Regret-related Choice Repetition , 2011, Journal of Cognitive Neuroscience.

[33]  Timothy Edward John Behrens,et al.  How Green Is the Grass on the Other Side? Frontopolar Cortex and the Evidence in Favor of Alternative Courses of Action , 2009, Neuron.

[34]  M. Reuter,et al.  Genetically Determined Differences in Learning from Errors , 2007, Science.

[35]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[36]  K. R. Ridderinkhof,et al.  Aging and the neuroeconomics of decision making: A review , 2009, Cognitive, affective & behavioral neuroscience.

[37]  C. Almli,et al.  Unbiased nonlinear average age-appropriate brain templates from birth to adulthood , 2009, NeuroImage.

[38]  Soyoung Q. Park,et al.  Prefrontal Cortex Fails to Learn from Reward Prediction Errors in Alcohol Dependence , 2010, The Journal of Neuroscience.

[39]  Michael X. Cohen,et al.  Dorsal Striatal–midbrain Connectivity in Humans Predicts How Reinforcements Are Used to Guide Decisions , 2009, Journal of Cognitive Neuroscience.

[40]  R. Sugden,et al.  Regret Theory: An alternative theory of rational choice under uncertainty Review of Economic Studies , 1982 .

[41]  Luke J. Chang,et al.  Computational Substrates of Social Value in Interpersonal Collaboration , 2015, The Journal of Neuroscience.

[42]  Kevin McCabe,et al.  Neural signature of fictive learning signals in a sequential investment task , 2007, Proceedings of the National Academy of Sciences.

[43]  K. Jarrod Millman,et al.  Analysis of Functional Magnetic Resonance Imaging in Python , 2007, Computing in Science & Engineering.

[44]  Matthew F.S. Rushworth,et al.  The Anterior Cingulate Gyrus and Social Cognition: Tracking the Motivation of Others , 2016, Neuron.

[45]  Andrea Parolin Jackowski,et al.  The involvement of the orbitofrontal cortex in psychiatric disorders: an update of neuroimaging findings. , 2012, Revista brasileira de psiquiatria.

[46]  Timothy Edward John Behrens,et al.  Separate value comparison and learning mechanisms in macaque medial and lateral orbitofrontal cortex , 2010, Proceedings of the National Academy of Sciences.