Using reinforcement learning models in social neuroscience: frameworks, pitfalls and suggestions of best practices

Abstract The recent years have witnessed a dramatic increase in the use of reinforcement learning (RL) models in social, cognitive and affective neuroscience. This approach, in combination with neuroimaging techniques such as functional magnetic resonance imaging, enables quantitative investigations into latent mechanistic processes. However, increased use of relatively complex computational approaches has led to potential misconceptions and imprecise interpretations. Here, we present a comprehensive framework for the examination of (social) decision-making with the simple Rescorla–Wagner RL model. We discuss common pitfalls in its application and provide practical suggestions. First, with simulation, we unpack the functional role of the learning rate and pinpoint what could easily go wrong when interpreting differences in the learning rate. Then, we discuss the inevitable collinearity between outcome and prediction error in RL models and provide suggestions of how to justify whether the observed neural activation is related to the prediction error rather than outcome valence. Finally, we suggest posterior predictive check is a crucial step after model comparison, and we articulate employing hierarchical modeling for parameter estimation. We aim to provide simple and scalable explanations and practical guidelines for employing RL models to assist both beginners and advanced users in better implementing and interpreting their model-based analyses.

[1]  E. Thorndike The law of effect. , 1927 .

[2]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[3]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[4]  石黒 真木夫,et al.  Akaike information criterion statistics , 1986 .

[5]  Scott E. Maxwell,et al.  Designing Experiments and Analyzing Data: A Model Comparison Perspective , 1990 .

[6]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[7]  Ken Kelley,et al.  Designing Experiments and Analyzing Data: A Model Comparison Perspective, Third Edition , 1999 .

[8]  P. Montague,et al.  Activity in human ventral striatum locked to errors of reward prediction , 2002, Nature Neuroscience.

[9]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[10]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[11]  Scott M. Lynch,et al.  Bayesian Posterior Predictive Checks for Complex Models , 2004 .

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[14]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[15]  M. D’Esposito,et al.  Reversal learning in Parkinson's disease depends on medication status and outcome valence , 2006, Neuropsychologia.

[16]  Robert J. Mislevy,et al.  Posterior Predictive Model Checking for Multidimensionality in Item Response Theory , 2006 .

[17]  J. O'Doherty,et al.  Model‐Based fMRI and Its Application to Reward Learning and Decision Making , 2007, Annals of the New York Academy of Sciences.

[18]  Michael J. Frank,et al.  Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning , 2007, Proceedings of the National Academy of Sciences.

[19]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[20]  Colin Camerer,et al.  A framework for studying the neurobiology of value-based decision making , 2008, Nature Reviews Neuroscience.

[21]  Roger Ratcliff,et al.  The Diffusion Decision Model: Theory and Data for Two-Choice Decision Tasks , 2008, Neural Computation.

[22]  Peter Bossaerts,et al.  Neural correlates of mentalizing-related computations during strategic interactions in humans , 2008, Proceedings of the National Academy of Sciences.

[23]  Mark W Woolrich,et al.  Associative learning of social value , 2008, Nature.

[24]  Yifeng Zeng,et al.  Graphical models for interactive POMDPs: representations and solutions , 2009, Autonomous Agents and Multi-Agent Systems.

[25]  Colin Camerer,et al.  Neuroeconomics: decision making and the brain , 2008 .

[26]  Matthew D. Lieberman,et al.  Serotonin Modulates Behavioral Reactions to Unfairness , 2008, Science.

[27]  R. Poldrack,et al.  Category learning and the memory systems debate , 2008, Neuroscience & Biobehavioral Reviews.

[28]  J. Gläscher,et al.  Determining a role for ventromedial prefrontal cortex in encoding action-based value signals during reward-related decision making. , 2009, Cerebral cortex.

[29]  Karl J. Friston,et al.  Predictive coding under the free-energy principle , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[30]  G. Fernández,et al.  Reinforcement Learning Signal Predicts Social Conformity , 2009, Neuron.

[31]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[32]  John P O'Doherty,et al.  Model-based approaches to neuroimaging: combining reinforcement learning theory with fMRI data. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[33]  Luke Clark,et al.  Serotonin selectively influences moral judgment and behavior through effects on harm aversion , 2010, Proceedings of the National Academy of Sciences.

[34]  W. Schultz,et al.  Neural mechanisms of observational learning , 2010, Proceedings of the National Academy of Sciences.

[35]  Simon Farrell,et al.  Computational Modeling in Cognition: Principles and Practice , 2010 .

[36]  M. Lee How cognitive modeling can benefit from hierarchical Bayesian models. , 2011 .

[37]  N. Daw,et al.  Differential roles of human striatum and amygdala in associative learning , 2011, Nature Neuroscience.

[38]  Nathaniel D. Daw,et al.  Trial-by-trial data analysis using computational models , 2011 .

[39]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[40]  W. T. Maddox,et al.  Annals of the New York Academy of Sciences Human Category Learning 2.0 Brief Review of First-generation Research , 2022 .

[41]  T. Robbins,et al.  Decision Making, Affect, and Learning: Attention and Performance XXIII , 2011 .

[42]  P. Dayan,et al.  Behavioral/systems/cognitive Action Dominates Valence in Anticipatory Representations in the Human Striatum and Dopaminergic Midbrain , 2010 .

[43]  Tom Manly,et al.  Rehabilitation of Executive Functioning in Patients with Frontal Lobe Brain Damage with Goal Management Training , 2011, Front. Hum. Neurosci..

[44]  Justin L. Gardner,et al.  Learning to Simulate Others' Decisions , 2012, Neuron.

[45]  H. Seo,et al.  Neural basis of reinforcement learning and decision making. , 2012, Annual review of neuroscience.

[46]  R. Adolphs,et al.  Impaired Learning of Social Compared to Monetary Rewards in Autism , 2012, Front. Neurosci..

[47]  Brian R. Tietz,et al.  Deciding Which Way to Go: How Do Insects Alter Movements to Negotiate Barriers? , 2012, Front. Neurosci..

[48]  Kyle E. Mathewson,et al.  Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning , 2012, Proceedings of the National Academy of Sciences.

[49]  P. Dayan,et al.  Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain , 2012, The Journal of Neuroscience.

[50]  James Kozloski,et al.  Self-referential forces are sufficient to explain different dendritic morphologies , 2013, Front. Neuroinform..

[51]  David B. Dunson,et al.  Bayesian data analysis, third edition , 2013 .

[52]  J. Rieskamp,et al.  DAT1 Polymorphism Determines L-DOPA Effects on Learning about Others’ Prosociality , 2013, PloS one.

[53]  Thomas V. Wiecki,et al.  HDDM: Hierarchical Bayesian estimation of the Drift-Diffusion Model in Python , 2013, Front. Neuroinform..

[54]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[55]  J. V. van Berkum,et al.  How robust is the language architecture? The case of mood , 2013, Front. Psychol..

[56]  B. Franke,et al.  Dissociable Effects of Dopamine and Serotonin on Reversal Learning , 2013, Neuron.

[57]  Daphna Shohamy,et al.  Representation of aversive prediction errors in the human periaqueductal gray , 2014, Nature Neuroscience.

[58]  M. Lee,et al.  A Bayesian hierarchical mixture approach to individual differences: Case studies in selective attention and representation in category learning ☆ , 2014 .

[59]  Rongjun Yu,et al.  The feedback related negativity encodes both social rejection and explicit social expectancy violation , 2014, Front. Hum. Neurosci..

[60]  Timothy Edward John Behrens,et al.  Dissociable contributions of ventromedial prefrontal and posterior parietal cortex to value-guided choice , 2014, NeuroImage.

[61]  E. Fehr,et al.  The neurobiology of rewards and values in social decision making , 2014, Nature Reviews Neuroscience.

[62]  L. Somerville,et al.  Adolescent-specific patterns of behavior and neural activity during social reinforcement learning , 2014, Cognitive, affective & behavioral neuroscience.

[63]  Lionel Rigoux,et al.  VBA: A Probabilistic Treatment of Nonlinear Models for Neurobiological and Behavioural Data , 2014, PLoS Comput. Biol..

[64]  M. Lee,et al.  Bayesian Cognitive Modeling: A Practical Course , 2014 .

[65]  E. Wagenmakers,et al.  Absolute performance of reinforcement-learning models for the Iowa Gambling Task , 2014 .

[66]  Karl J. Friston,et al.  Uncertainty in perception and the Hierarchical Gaussian Filter , 2014, Front. Hum. Neurosci..

[67]  N. Daw Advanced Reinforcement Learning , 2014 .

[68]  Ivo Käthner,et al.  An auditory multiclass brain-computer interface with natural stimuli: Usability evaluation with healthy participants and a motor impaired end user , 2015, Front. Hum. Neurosci..

[69]  Richard McElreath,et al.  Statistical Rethinking: A Bayesian Course with Examples in R and Stan , 2015 .

[70]  E. Wagenmakers,et al.  An Introduction to Model-Based Cognitive Neuroscience , 2015, Springer New York.

[71]  Daniel Brandeis,et al.  Cognitive flexibility in adolescence: Neural and behavioral mechanisms of reward prediction error processing in adaptive decision making during development , 2015, NeuroImage.

[72]  Jens Timmer,et al.  Summary of the DREAM8 Parameter Estimation Challenge: Toward Parameter Identification for Whole-Cell Models , 2015, PLoS Comput. Biol..

[73]  P. Tobler,et al.  Efficient learning mechanisms hold in the social domain and are implemented in the medial prefrontal cortex. , 2015, Social cognitive and affective neuroscience.

[74]  Thomas V. Wiecki,et al.  fMRI and EEG Predictors of Dynamic Decision Parameters during Human Reinforcement Learning , 2015, The Journal of Neuroscience.

[75]  Robert C. Wilson,et al.  Is Model Fitting Necessary for Model-Based fMRI? , 2015, PLoS Comput. Biol..

[76]  Stefano Tamburin,et al.  Psychological Considerations in the Assessment and Treatment of Pain in Neurorehabilitation and Psychological Factors Predictive of Therapeutic Response: Evidence and Recommendations from the Italian Consensus Conference on Pain in Neurorehabilitation , 2016, Front. Psychol..

[77]  B. Seymour,et al.  Fear reduction without fear through reinforcement of neural activity that bypasses conscious exposure , 2016, Nature Human Behaviour.

[78]  Robbie C. M. van Aert,et al.  Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking , 2016, Front. Psychol..

[79]  Niklas Ihssen,et al.  Observing others stay or switch – How social prediction errors are integrated into reward reversal learning , 2016, Cognition.

[80]  Essi Viding,et al.  Neurocomputational mechanisms of prosocial learning and links to empathy , 2016, Proceedings of the National Academy of Sciences.

[81]  J. Gläscher,et al.  Congruence of Inherent and Acquired Values Facilitates Reward-Based Decision-Making , 2016, The Journal of Neuroscience.

[82]  Dominik R. Bach,et al.  Heuristic and optimal policy computations in the human brain during sequential decision-making , 2018, Nature Communications.

[83]  Kristian Lum,et al.  Limitations of mitigating judicial bias with machine learning , 2017, Nature Human Behaviour.

[84]  M. Frank,et al.  University of Birmingham Catecholaminergic challenge uncovers distinct Pavlovian and instrumental mechanisms of motivated (in)action , 2017 .

[85]  Markus Ullsperger,et al.  Learning relative values in the striatum induces violations of normative decision making , 2017, Nature Communications.

[86]  R. Dolan,et al.  Neural and computational processes underlying dynamic changes in self-esteem , 2017, eLife.

[87]  Lei Zhang,et al.  Revealing Neurocomputational Mechanisms of Reinforcement Learning and Decision-Making With the hBayesDM Package , 2016, bioRxiv.

[88]  E. Koechlin,et al.  The Importance of Falsification in Computational Cognitive Modeling , 2017, Trends in Cognitive Sciences.

[89]  Kai Li,et al.  Computational approaches to fMRI analysis , 2017, Nature Neuroscience.

[90]  Wolfgang M. Pauli,et al.  Learning, Reward, and Decision Making , 2017, Annual review of psychology.

[91]  Aki Vehtari,et al.  Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC , 2015, Statistics and Computing.

[92]  John P O'Doherty,et al.  A causal account of the brain network computations underlying strategic social behavior , 2017, Nature Neuroscience.

[93]  F. Watt,et al.  Type XVII collagen coordinates proliferation in the interfollicular epidermis , 2017, eLife.

[94]  G. Grest,et al.  Corrigendum: Superfast assembly and synthesis of gold nanostructures using nanosecond low-temperature compression via magnetic pulsed power , 2017, Nature Communications.

[95]  Andreas Olsson,et al.  A common neural network differentially mediates direct and social fear learning , 2018, NeuroImage.

[96]  Peter Dayan,et al.  A model of risk and mental state shifts during social interaction , 2017, PLoS Comput. Biol..

[97]  Marco K. Wittmann,et al.  Ventral anterior cingulate cortex and social decision-making , 2018, Neuroscience & Biobehavioral Reviews.

[98]  Hackjin Kim,et al.  Development of MPFC function mediates shifts in self-protective behavior provoked by social feedback , 2018, Nature Communications.

[99]  Hongtao Yuan,et al.  Carrier density and disorder tuned superconductor-metal transition in a two-dimensional electron system , 2018, Nature Communications.

[100]  C. Ruff,et al.  Neurocomputational approaches to social behavior. , 2018, Current opinion in psychology.

[101]  Miriam C Klein-Flügge,et al.  Neural mechanisms for learning self and other ownership , 2018, Nature Communications.

[102]  Sharif I. Kronemer,et al.  Differential Valuation and Learning From Social and Nonsocial Cues in Borderline Personality Disorder , 2018, Biological Psychiatry.

[103]  J. Dreher,et al.  Spreading inequality: neural computations underlying paying-it-forward reciprocity , 2018, Social cognitive and affective neuroscience.

[104]  Hongbo Yu,et al.  Distinguishing neural correlates of context-dependent advantageous- and disadvantageous-inequity aversion , 2018, Proceedings of the National Academy of Sciences.

[105]  Jenifer Z. Siegel,et al.  Beliefs about bad people are volatile , 2018, Nature Human Behaviour.

[106]  Kris Gevaert,et al.  DET1-mediated degradation of a SAGA-like deubiquitination module controls H2Bub homeostasis , 2018, bioRxiv.

[107]  Woo-Young Ahn,et al.  The Outcome-Representation Learning Model: A Novel Reinforcement Learning Model of the Iowa Gambling Task , 2018, Cogn. Sci..

[108]  Athina Tzovara,et al.  Human Pavlovian fear conditioning conforms to probabilistic learning , 2018, PLoS computational biology.

[109]  T. Robbins,et al.  Value generalization in human avoidance learning , 2017, bioRxiv.

[110]  P. Tobler,et al.  A computational reward learning account of social media engagement , 2019 .

[111]  J. Daunizeau,et al.  Assessing inter-individual differences with task-related functional neuroimaging , 2019, Nature Human Behaviour.

[112]  P. Tobler,et al.  Social threat learning transfers to decision making in humans , 2019, Proceedings of the National Academy of Sciences.

[113]  R. Cools,et al.  Emotionally Aversive Cues Suppress Neural Systems Underlying Optimal Learning in Socially Anxious Individuals , 2018, The Journal of Neuroscience.

[114]  J. Buitelaar,et al.  Modeling flexible behaviour in autism spectrum disorder and typical development , 2019 .

[115]  Rebecca L. Bond,et al.  Altered learning under uncertainty in unmedicated mood and anxiety disorders , 2019, Nature Human Behaviour.

[116]  Miriam C. Klein-Flügge,et al.  Neural signatures of model-free learning when avoiding harm to self and other , 2019, bioRxiv.

[117]  Jenifer Z. Siegel,et al.  Exposure to violence affects the development of moral impressions and trust behavior in incarcerated males , 2019, Nature Communications.

[118]  Alireza Soltani,et al.  Adaptive learning under expected and unexpected uncertainty , 2019, Nature Reviews Neuroscience.

[119]  J. Gläscher,et al.  A network supporting social influences in human decision-making , 2019, bioRxiv.

[120]  Luke J. Chang,et al.  The computational and neural substrates of moral strategies in social decision-making , 2019, Nature Communications.

[121]  Robert C Wilson,et al.  Ten simple rules for the computational modeling of behavioral data , 2019, eLife.

[122]  C. Spencer,et al.  Interferon lambda 4 impacts the genetic diversity of hepatitis C virus , 2019, eLife.

[123]  A. Harel,et al.  Combined loss of LAP1B and LAP1C results in an early onset multisystemic nuclear envelopathy , 2019, Nature Communications.

[124]  Tom Heskes,et al.  Hierarchical Bayesian inference for concurrent model fitting and comparison for group studies , 2019, PLoS computational biology.

[125]  O. Robinson,et al.  The Importance of Group Specification in Computational Modelling of Behaviour , 2020 .

[126]  J. Gläscher,et al.  Theory of mind and decision science: Towards a typology of tasks and computational models , 2020, Neuropsychologia.

[127]  Miriam C. Klein-Flügge,et al.  Computational modelling of social cognition and behaviour—a reinforcement learning primer , 2020, Social cognitive and affective neuroscience.

[128]  Francis Tuerlinckx,et al.  The Affective Ising Model: A computational account of human affect dynamics , 2020, PLoS Comput. Biol..

[129]  Ewelina Knapska,et al.  The neural and computational systems of social learning , 2020, Nature Reviews Neuroscience.

[130]  Dominik R Bach,et al.  Computational optimization of associative learning experiments , 2020, PLoS Comput. Biol..