Title : Biased credit assignment in 1 motivational learning biases arises 2 through prefrontal influences on striatal 3 learning

Actions are biased by the outcomes they can produce: Humans are more likely to show action under reward prospect, but hold back under punishment prospect. Such motivational biases derive not only from biased response selection, but also from biased learning: humans tend to attribute rewards to their own actions, but are reluctant to attribute punishments to having held back. The neural origin of these biases is unclear; in particular, it remains open whether motivational biases arise solely from an evolutionarily old, subcortical architecture or also due to younger, cortical influences. Simultaneous EEG-fMRI allowed us to track which regions encoded biased prediction errors in which order. Biased prediction errors occurred in cortical regions (ACC, vmPFC, PCC) before subcortical regions (striatum). These results highlight that biased learning is not a mere feature of the basal ganglia, but arises through prefrontal cortical contributions, revealing motivational biases to be a potentially flexible, sophisticated mechanism.Cortical influences on subcortical learning explain why we attribute rewards to actions, but not punishments to inactions.

[1]  R. Cools,et al.  Effects of dopamine on reinforcement learning in Parkinson’s disease depend on motor phenotype , 2020, Brain : a journal of neurology.

[2]  R. Cools,et al.  Striatal BOLD and midfrontal theta power express motivation for action , 2020, bioRxiv.

[3]  Marco K. Wittmann,et al.  Polarity of uncertainty representation during exploration and exploitation in ventromedial prefrontal cortex , 2020, Nature Human Behaviour.

[4]  E. Koechlin,et al.  Neural mechanisms resolving exploitation-exploration dilemmas in the medial prefrontal cortex , 2020, Science.

[5]  Janne M. Hahne,et al.  Longitudinal Case Study of Regression-Based Hand Prosthesis Control in Daily Life , 2020, Frontiers in Neuroscience.

[6]  Joel L. Voss,et al.  Targeted Stimulation of Human Orbitofrontal Networks Disrupts Outcome-Guided Behavior , 2019, Current Biology.

[7]  Mario Carlo Severo,et al.  Dissociable effects of reward magnitude on fronto-medial theta and FRN during performance monitoring. , 2019, Psychophysiology.

[8]  Nicolas W. Schuck,et al.  An Integrated Model of Action Selection: Distinct Modes of Cortical Control of Striatal Decision Making. , 2019, Annual review of psychology.

[9]  Rainer Goebel,et al.  Active head motion reduction in magnetic resonance imaging using tactile feedback , 2019, bioRxiv.

[10]  R. Dolan,et al.  Dorsal striatal dopamine D1 receptor availability predicts an instrumental bias in action learning , 2018, Proceedings of the National Academy of Sciences.

[11]  Jessica I. Määttä,et al.  Frontal network dynamics reflect neurocomputational mechanisms for reducing maladaptive biases in motivated action , 2018, PLoS biology.

[12]  Tom Heskes,et al.  Hierarchical Bayesian inference for concurrent model fitting and comparison for group studies , 2018, bioRxiv.

[13]  A. Graybiel,et al.  Striatal Microstimulation Induces Persistent and Repetitive Negative Decision-Making Predicted by Striatal Beta-Band Oscillation , 2018, Neuron.

[14]  M. Philiastides,et al.  Separate neural representations of prediction error valence and surprise: Evidence from an fMRI meta‐analysis , 2018, Human brain mapping.

[15]  Joshua W. Brown,et al.  Frontal cortex function as derived from hierarchical predictive coding , 2018, Scientific Reports.

[16]  Jacqueline Scholl,et al.  Simultaneous representation of a spectrum of dynamically changing value estimates during decision making , 2017, Nature Communications.

[17]  P. Dayan,et al.  Modeling Avoidance in Mood and Anxiety Disorders Using Reinforcement Learning , 2017, Biological Psychiatry.

[18]  E. Koechlin,et al.  The Importance of Falsification in Computational Cognitive Modeling , 2017, Trends in Cognitive Sciences.

[19]  Joshua L. Jones,et al.  Dopamine transients are sufficient and necessary for acquisition of model-based associations , 2017, Nature Neuroscience.

[20]  Jan R. Wessel,et al.  On the Globality of Motor Suppression: Unexpected Events and Their Influence on Behavior and Cognition , 2017, Neuron.

[21]  C. Mulert,et al.  Theta and high-beta networks for feedback processing: a simultaneous EEG–fMRI study in healthy male subjects , 2017, Translational Psychiatry.

[22]  Matthew R Nassar,et al.  Taming the beast: extracting generalizable knowledge from computational models of cognition , 2016, Current Opinion in Behavioral Sciences.

[23]  Jeremy Goslin,et al.  Principal components analysis of reward prediction errors in a reinforcement learning task , 2016, NeuroImage.

[24]  N. Daw,et al.  Deciding How To Decide: Self-Control and Meta-Decision Making , 2015, Trends in Cognitive Sciences.

[25]  A. Graybiel,et al.  Bursts of beta oscillation differentiate postperformance activity in the striatum and motor cortex of monkeys performing movement tasks , 2015, Proceedings of the National Academy of Sciences.

[26]  M. Philiastides,et al.  TITLE : Two spatiotemporally distinct value systems shape reward-based learning in the human brain , 2015 .

[27]  S. Luck,et al.  How inappropriate high-pass filters can produce artifactual effects and incorrect conclusions in ERP studies of language and cognition. , 2015, Psychophysiology.

[28]  Robert C. Wilson,et al.  Is Model Fitting Necessary for Model-Based fMRI? , 2015, PLoS Comput. Biol..

[29]  Alberto Llera,et al.  ICA-AROMA: A robust ICA-based strategy for removing motion artifacts from fMRI data , 2015, NeuroImage.

[30]  Edward M Bernat,et al.  Time-frequency theta and delta measures index separable components of feedback processing in a gambling task. , 2015, Psychophysiology.

[31]  James F. Cavanagh,et al.  Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times , 2015, NeuroImage.

[32]  Greg H. Proudfit The reward positivity: from basic research on reward to a biomarker for depression. , 2015, Psychophysiology.

[33]  A. Rodríguez-Fornells,et al.  Neuroscience and Biobehavioral Reviews the Role of High-frequency Oscillatory Activity in Reward Processing and Learning , 2022 .

[34]  J. Schoffelen,et al.  University of Birmingham Occipital alpha activity during stimulus processing gates the information flow to object-selective cortex , 2014 .

[35]  Anne G E Collins,et al.  A Reinforcement Learning Mechanism Responsible for the Valuation of Free Choice , 2014, Neuron.

[36]  E. Miller,et al.  Increases in Functional Connectivity between Prefrontal Cortex and Striatum during Category Learning , 2014, Neuron.

[37]  Anne G E Collins,et al.  Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. , 2014, Psychological review.

[38]  P. Dayan,et al.  Action versus valence in decision making , 2014, Trends in Cognitive Sciences.

[39]  Tobias U. Hauser,et al.  The feedback-related negativity (FRN) revisited: New insights into the localization, meaning and network organization , 2014, NeuroImage.

[40]  Shinsuke Shimojo,et al.  Neural Computations Underlying Arbitration between Model-Based and Model-free Learning , 2013, Neuron.

[41]  Michael X. Cohen,et al.  Midfrontal conflict-related theta-band power reflects neural oscillations that predict behavior. , 2013, Journal of neurophysiology.

[42]  Mark W. Woolrich,et al.  Trial-Type Dependent Frames of Reference for Value Comparison , 2013, PLoS Comput. Biol..

[43]  Joseph W. Kable,et al.  The valuation system: A coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value , 2013, NeuroImage.

[44]  Deborah Talmi,et al.  The Feedback-Related Negativity Signals Salience Prediction Errors, Not Reward Prediction Errors , 2013, The Journal of Neuroscience.

[45]  D. Barr,et al.  Random effects structure for confirmatory hypothesis testing: Keep it maximal. , 2013, Journal of memory and language.

[46]  Raymond J. Dolan,et al.  Go and no-go learning in reward and punishment: Interactions between affect and effect , 2012, NeuroImage.

[47]  B. Averbeck,et al.  Action Selection and Action Value in Frontal-Striatal Circuits , 2012, Neuron.

[48]  M. Coles,et al.  The influence of the magnitude, probability, and valence of potential wins and losses on the amplitude of the feedback negativity. , 2012, Psychophysiology.

[49]  Michael X. Cohen,et al.  Cortical electrophysiological network dynamics of feedback learning , 2011, Trends in Cognitive Sciences.

[50]  Michael X. Cohen,et al.  Frontal Oscillatory Dynamics Predict Feedback Learning and Action Adjustment , 2011, Journal of Cognitive Neuroscience.

[51]  Heleen A Slagter,et al.  Event‐related potential activity in the basal ganglia differentiates rewards from nonrewards: Temporospatial principal components analysis and source localization of the feedback negativity: Commentary , 2011, Human brain mapping.

[52]  Anna Weinberg,et al.  Event‐related potential activity in the basal ganglia differentiates rewards from nonrewards: Temporospatial principal components analysis and source localization of the feedback negativity , 2011, Human brain mapping.

[53]  Joshua W. Brown,et al.  Medial prefrontal cortex as an action-outcome predictor , 2011, Nature Neuroscience.

[54]  Raymond J. Dolan,et al.  Disentangling the Roles of Approach, Activation and Valence in Instrumental and Pavlovian Responding , 2011, PLoS Comput. Biol..

[55]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[56]  R. Oostenveld,et al.  Neuronal Dynamics Underlying High- and Low-Frequency EEG Oscillations Contribute Independently to the Human BOLD Signal , 2011, Neuron.

[57]  Robert Oostenveld,et al.  FieldTrip: Open Source Software for Advanced Analysis of MEG, EEG, and Invasive Electrophysiological Data , 2010, Comput. Intell. Neurosci..

[58]  J. Krakauer,et al.  Error correction, sensory prediction, and adaptation in motor control. , 2010, Annual review of neuroscience.

[59]  A. Engel,et al.  Beta-band oscillations—signalling the status quo? , 2010, Current Opinion in Neurobiology.

[60]  James F. Cavanagh,et al.  Frontal theta links prediction errors to behavioral adaptation in reinforcement learning , 2010, NeuroImage.

[61]  M. Frank,et al.  Instructional control of reinforcement learning: A behavioral and neurocomputational investigation , 2009, Brain Research.

[62]  Xiaolin Zhou,et al.  The P300 and reward valence, magnitude, and expectancy in outcome evaluation , 2009, Brain Research.

[63]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[64]  A. Villringer,et al.  Rolandic alpha and beta EEG rhythms' strengths are inversely related to fMRI‐BOLD signal in primary somatosensory and motor cortex , 2009, Human brain mapping.

[65]  Robert Oostenveld,et al.  Trial-by-trial coupling between EEG and BOLD identifies networks related to alpha and theta EEG power increases during working memory maintenance , 2009, NeuroImage.

[66]  David Cucurell,et al.  Human oscillatory activity associated to reward processing in a gambling task , 2008, Neuropsychologia.

[67]  N. Daw,et al.  Striatal Activity Underlies Novelty-Based Choice in Humans , 2008, Neuron.

[68]  R. Oostenveld,et al.  Frontal theta EEG activity correlates negatively with the default mode network in resting state. , 2008, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[69]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[70]  Peter Dayan,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[71]  Michael T. Jurkiewicz,et al.  Post-movement beta rebound is generated in motor cortex: Evidence from neuromagnetic recordings , 2006, NeuroImage.

[72]  Michael J. Frank,et al.  Error-Related Negativity Predicts Reinforcement Learning and Conflict Biases , 2005, Neuron.

[73]  Atsushi Sato,et al.  Effects of value and reward magnitude on feedback negativity and P300 , 2005, Neuroreport.

[74]  E. Miller,et al.  Different time courses of learning-related activity in the prefrontal cortex and striatum , 2005, Nature.

[75]  Michael J. Frank,et al.  Dynamic Dopamine Modulation in the Basal Ganglia: A Neurocomputational Account of Cognitive Deficits in Medicated and Nonmedicated Parkinsonism , 2005, Journal of Cognitive Neuroscience.

[76]  A. Sanfey,et al.  Independent Coding of Reward Magnitude and Valence in the Human Brain , 2004, The Journal of Neuroscience.

[77]  S. Haber The primate basal ganglia: parallel and integrative networks , 2003, Journal of Chemical Neuroanatomy.

[78]  Robert Turner,et al.  A Method for Removing Imaging Artifact from Continuous EEG Recorded during Functional MRI , 2000, NeuroImage.

[79]  Jonathan Baron,et al.  Behavioral Law and Economics: Reluctance to Vaccinate: Omission Bias and Ambiguity , 1990 .

[80]  D. R. Williams,et al.  Auto-maintenance in the pigeon: sustained pecking despite contingent non-reinforcement. , 1969, Journal of the experimental analysis of behavior.

[81]  P. L. Brown,et al.  Auto-shaping of the pigeon's key-peck. , 1968, Journal of the experimental analysis of behavior.