A State Representation for Reinforcement Learning and Decision-Making in the Orbitofrontal Cortex

Abstract Despite decades of research, the exact ways in which the orbitofrontal cortex (OFC) influences cognitive function have remained mysterious. Anatomically, the OFC is characterized by remarkably broad connectivity to sensory, limbic, and subcortical areas, and functional studies have implicated the OFC in a plethora of functions ranging from facial processing to value-guided choice. Notwithstanding such diversity of findings, much research suggests that one important function of the OFC is to support decision-making and reinforcement learning. Here, we describe a novel theory that posits that OFC's specific role in decision-making is to provide an up-to-date representation of task-related information, called a state representation. This representation reflects a mapping between distinct task states and sensory as well as unobservable information. We summarize evidence supporting the existence of such state representations in rodent and human OFC and argue that forming these state representations provides a crucial scaffold that allows animals to efficiently perform decision-making and reinforcement learning in high-dimensional and partially observable environments. Finally, we argue that our theory offers an integrating framework for linking the diversity of functions ascribed to OFC and is in line with its wide ranging connectivity.

[1]  C. Padoa-Schioppa,et al.  Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.

[2]  D. Schutter,et al.  Increased positive emotional memory after repetitive transcranial magnetic stimulation over the orbitofrontal cortex. , 2006, Journal of psychiatry & neuroscience : JPN.

[3]  J. Price,et al.  Architectonic subdivision of the human orbital and medial prefrontal cortex , 2003, The Journal of comparative neurology.

[4]  Y. Niv,et al.  Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based, But Not Model-Free, Reinforcement Learning , 2011, The Journal of Neuroscience.

[5]  W. Schultz,et al.  Relative reward preference in primate orbitofrontal cortex , 1999, Nature.

[6]  J. Gläscher,et al.  Congruence of Inherent and Acquired Values Facilitates Reward-Based Decision-Making , 2016, The Journal of Neuroscience.

[7]  Adam Kepecs,et al.  Midbrain Dopamine Neurons Signal Belief in Choice Accuracy during a Perceptual Decision , 2017, Current Biology.

[8]  Robert C. Wilson,et al.  Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex , 2011, Nature Neuroscience.

[9]  G. Schoenbaum,et al.  Orbitofrontal Cortex and Representation of Incentive Value in Associative Learning , 1999, The Journal of Neuroscience.

[10]  Geoffrey Schoenbaum,et al.  The role of the orbitofrontal cortex in the pursuit of happiness and more specific rewards , 2008, Nature.

[11]  Timothy Edward John Behrens,et al.  Two Anatomically and Computationally Distinct Learning Signals Predict Changes to Stimulus-Outcome Associations in Hippocampus , 2016, Neuron.

[12]  Laura A. Bradfield,et al.  Medial Orbitofrontal Cortex Mediates Outcome Retrieval in Partially Observable Task Situations , 2015, Neuron.

[13]  Geoffrey Schoenbaum,et al.  Different Roles for Orbitofrontal Cortex and Basolateral Amygdala in a Reinforcer Devaluation Task , 2003, The Journal of Neuroscience.

[14]  Charles M. Butter,et al.  Perseveration in extinction and in discrimination reversal tasks following selective frontal ablations in Macaca mulatta , 1969 .

[15]  G. Schoenbaum,et al.  Lateral orbitofrontal neurons acquire responses to upshifted, downshifted, or blocked cues during unblocking , 2015, eLife.

[16]  Timothy E. J. Behrens,et al.  Review Frontal Cortex and Reward-guided Learning and Decision-making Figure 1. Frontal Brain Regions in the Macaque Involved in Reward-guided Learning and Decision-making Finer Grained Anatomical Divisions with Frontal Cortical Systems for Reward-guided Behavior , 2022 .

[17]  Robert C. Wilson,et al.  Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.

[18]  Jerald D. Kralik,et al.  Rhesus monkeys with orbital prefrontal cortex lesions can learn to inhibit prepotent responses in the reversed reward contingency task. , 2006, Cerebral cortex.

[19]  Yael Niv,et al.  A Probability Distribution over Latent Causes, in the Orbitofrontal Cortex , 2016, The Journal of Neuroscience.

[20]  S. Thorpe,et al.  The orbitofrontal cortex: Neuronal activity in the behaving monkey , 2004, Experimental Brain Research.

[21]  R. Saunders,et al.  Prefrontal mechanisms of behavioral flexibility, emotion regulation and value updating , 2013, Nature Neuroscience.

[22]  R. Elliott,et al.  Dissociable functions in the medial and lateral orbitofrontal cortex: evidence from human neuroimaging studies. , 2000, Cerebral cortex.

[23]  E. Murray,et al.  Bilateral Orbital Prefrontal Cortex Lesions in Rhesus Monkeys Disrupt Choices Guided by Both Reward Value and Reward Contingency , 2004, The Journal of Neuroscience.

[24]  C. Padoa-Schioppa,et al.  The representation of economic value in the orbitofrontal cortex is invariant for changes of menu , 2008, Nature Neuroscience.

[25]  J. O'Doherty,et al.  Dissociating Valence of Outcome from Behavioral Control in Human Orbital and Ventral Prefrontal Cortices , 2003, The Journal of Neuroscience.

[26]  F. R. A. Hopgood,et al.  Machine Intelligence 2 , 1970, The Mathematical Gazette.

[27]  Suzanne N. Haber,et al.  Circuit-Based Corticostriatal Homologies Between Rat and Primate , 2016, Biological Psychiatry.

[28]  G. Schoenbaum,et al.  Basolateral Amygdala Lesions Abolish Orbitofrontal-Dependent Reversal Impairments , 2007, Neuron.

[29]  Aron K Barbey,et al.  Orbitofrontal contributions to human working memory. , 2011, Cerebral cortex.

[30]  J. O'Doherty,et al.  What We Know and Do Not Know about the Functions of the Orbitofrontal Cortex after 20 Years of Cross-Species Studies , 2007, The Journal of Neuroscience.

[31]  D. Perrett,et al.  Beauty in a smile: the role of medial orbitofrontal cortex in facial attractiveness , 2003, Neuropsychologia.

[32]  Steven P. Wise,et al.  Forward frontal fields: phylogeny and fundamental function , 2008, Trends in Neurosciences.

[33]  Hatim A. Zariwala,et al.  Neural correlates, computation and behavioural impact of decision confidence , 2008, Nature.

[34]  D. R. Snyder,et al.  Effects of orbital frontal lesions on aversive and aggressive behaviors in rhesus monkeys. , 1970, Journal of comparative and physiological psychology.

[35]  Michael L. Platt,et al.  Social Signals in Primate Orbitofrontal Cortex , 2012, Current Biology.

[36]  G. Schoenbaum,et al.  Orbitofrontal neurons infer the value and identity of predicted outcomes , 2014, Nature Communications.

[37]  P. Tobler,et al.  Identity-specific coding of future rewards in the human orbitofrontal cortex , 2015, Proceedings of the National Academy of Sciences.

[38]  Robert C. Wilson,et al.  Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms , 2015, The Journal of Neuroscience.

[39]  Rajesh P. N. Rao,et al.  Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes , 2010, Front. Comput. Neurosci..

[40]  Y. Niv,et al.  Discovering latent causes in reinforcement learning , 2015, Current Opinion in Behavioral Sciences.

[41]  Angela Sirigu,et al.  Modulation of value representation by social context in the primate orbitofrontal cortex , 2012, Proceedings of the National Academy of Sciences.

[42]  Alicia Izquierdo,et al.  Comparison of the Effects of Bilateral Orbital Prefrontal Cortex Lesions and Amygdala Lesions on Emotional Responses in Rhesus Monkeys , 2005, The Journal of Neuroscience.

[43]  T. Robbins,et al.  Dissociable Contributions of the Orbitofrontal and Infralimbic Cortex to Pavlovian Autoshaping and Discrimination Reversal Learning: Further Evidence for the Functional Heterogeneity of the Rodent Frontal Cortex , 2003, The Journal of Neuroscience.

[44]  L. Kamin Predictability, surprise, attention, and conditioning , 1967 .

[45]  J. Fuster Prefrontal Cortex , 2018 .

[46]  A. Koulakov,et al.  Orbitofrontal Cortex Is Required for Optimal Waiting Based on Decision Confidence , 2014, Neuron.

[47]  James M. Kilner,et al.  Brain systems for assessing facial attractiveness , 2007, Neuropsychologia.

[48]  Lauren V. Kustner,et al.  Shaping of Object Representations in the Human Medial Temporal Lobe Based on Temporal Regularities , 2012, Current Biology.

[49]  G. Schoenbaum,et al.  What the orbitofrontal cortex does not do , 2015, Nature Neuroscience.

[50]  Howard Eichenbaum,et al.  Orbitofrontal Cortex Encodes Memories within Value-Based Schemas and Represents Contexts That Guide Memory Retrieval , 2015, The Journal of Neuroscience.

[51]  Michael Petrides,et al.  Orbitofrontal Cortex and Memory Formation , 2002, Neuron.

[52]  Y. Niv,et al.  Temporal Specificity of Reward Prediction Errors Signaled by Putative Dopamine Neurons in Rat VTA Depends on Ventral Striatum , 2016, Neuron.

[53]  G. Schoenbaum,et al.  Cholinergic Interneurons Use Orbitofrontal Input to Track Beliefs about Current State , 2016, The Journal of Neuroscience.

[54]  Janneke F. M. Jehee,et al.  Sensory uncertainty decoded from visual cortex predicts behavior , 2015, Nature Neuroscience.

[55]  Donald Michie,et al.  BOXES: AN EXPERIMENT IN ADAPTIVE CONTROL , 2013 .

[56]  P. Dayan,et al.  Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.

[57]  Geoffrey Schoenbaum,et al.  Risk-Responsive Orbitofrontal Neurons Track Acquired Salience , 2013, Neuron.

[58]  Colin Camerer,et al.  Dissociating the Role of the Orbitofrontal Cortex and the Striatum in the Computation of Goal Values and Prediction Errors , 2008, The Journal of Neuroscience.

[59]  Elizabeth A. West,et al.  Transient Inactivation of Orbitofrontal Cortex Blocks Reinforcer Devaluation in Macaques , 2011, The Journal of Neuroscience.

[60]  J. O'Doherty,et al.  Encoding Predictive Reward Value in Human Amygdala and Orbitofrontal Cortex , 2003, Science.

[61]  Erin L. Rich,et al.  Decoding subjective decisions from orbitofrontal cortex , 2016, Nature Neuroscience.

[62]  Timothy E. J. Behrens,et al.  Giving credit where credit is due: orbitofrontal cortex and valuation in an uncertain world , 2011, Annals of the New York Academy of Sciences.

[63]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[64]  Mortimer Mishkin,et al.  A re-examination of the effects of frontal lesions on object alternation , 1969 .

[65]  K. Doya,et al.  Multiple Representations of Belief States and Action Values in Corticobasal Ganglia Loops , 2007, Annals of the New York Academy of Sciences.

[66]  G. Schoenbaum,et al.  Does the orbitofrontal cortex signal value? , 2011, Annals of the New York Academy of Sciences.

[67]  M. Roesch,et al.  Orbitofrontal cortex, decision-making and drug addiction , 2006, Trends in Neurosciences.

[68]  David J. Anderson,et al.  Ventromedial hypothalamic neurons control a defensive emotion state , 2015, eLife.

[69]  K. Cicerone,et al.  Disturbance of social cognition after traumatic orbitofrontal brain injury. , 1997, Archives of clinical neuropsychology : the official journal of the National Academy of Neuropsychologists.

[70]  J. Hodges,et al.  The orbitofrontal cortex is involved in emotional enhancement of memory: evidence from the dementias. , 2013, Brain : a journal of neurology.

[71]  Cavada,et al.  The mysterious orbitofrontal cortex. foreword , 2000, Cerebral cortex.

[72]  Nicolas W. Schuck,et al.  Human Orbitofrontal Cortex Represents a Cognitive Map of State Space , 2016, Neuron.

[73]  Alex S. Taylor,et al.  Machine intelligence , 2009, CHI.

[74]  Maria V. Sanchez-Vives,et al.  Lateral orbitofrontal cortex anticipates choices and integrates prior with current information , 2017, Nature Communications.

[75]  Daphne Koller,et al.  Reinforcement Learning Using Approximate Belief States , 1999, NIPS.

[76]  Timothy Edward John Behrens,et al.  Separate value comparison and learning mechanisms in macaque medial and lateral orbitofrontal cortex , 2010, Proceedings of the National Academy of Sciences.

[77]  Jonathan W. Pillow,et al.  A Bayesian method for reducing bias in neural representational similarity analysis , 2016, bioRxiv.

[78]  T. Preuss Do Rats Have Prefrontal Cortex? The Rose-Woolsey-Akert Program Reconsidered , 1995, Journal of Cognitive Neuroscience.

[79]  Joshua L. Jones,et al.  Orbitofrontal neurons acquire responses to 'valueless' Pavlovian cues during unblocking. , 2014, eLife.

[80]  A. Damasio,et al.  Emotion, decision making and the orbitofrontal cortex. , 2000, Cerebral cortex.

[81]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[82]  R. Knight,et al.  The role of the orbitofrontal cortex in regulation of interpersonal space: evidence from frontal lesion and frontotemporal dementia patients. , 2016, Social cognitive and affective neuroscience.

[83]  Alumit Ishai,et al.  Sex, beauty and the orbitofrontal cortex. , 2007, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[84]  M Freedman,et al.  Orbitofrontal function, object alternation and perseveration. , 1998, Cerebral cortex.

[85]  Christian F. Doeller,et al.  The Role of Mental Maps in Decision-Making , 2017, Trends in Neurosciences.

[86]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[87]  Nicolas W. Schuck,et al.  Medial Prefrontal Cortex Predicts Internally Driven Strategy Shifts , 2015, Neuron.

[88]  Kimberly L. Stachenfeld,et al.  The hippocampus as a predictive map , 2017, Nature Neuroscience.

[89]  J. Bachevalier,et al.  Behavioral/systems/cognitive Selective Aspiration or Neurotoxic Lesions of Orbital Frontal Areas 11 and 13 Spared Monkeys' Performance on the Object Discrimination Reversal Task , 2022 .

[90]  H. D. Steklis,et al.  The effects of orbitofrontal lesions on the aggressive behavior of vervet monkeys (Cercopithecus aethiops sabaeus) , 1979, Experimental Neurology.

[91]  Yael Niv,et al.  Reinforcement learning with Marr , 2016, Current Opinion in Behavioral Sciences.

[92]  J. O'Doherty,et al.  Orbitofrontal Cortex Encodes Willingness to Pay in Everyday Economic Transactions , 2007, The Journal of Neuroscience.

[93]  Luke J. Chang,et al.  Connectivity-Based Parcellation of the Human Orbitofrontal Cortex , 2012, The Journal of Neuroscience.

[94]  N. Rempel-Clower,et al.  Role of Orbitofrontal Cortex Connections in Emotion , 2007, Annals of the New York Academy of Sciences.

[95]  A. Bechara The role of emotion in decision-making: Evidence from neurological patients with orbitofrontal damage , 2004, Brain and Cognition.

[96]  G. Schoenbaum,et al.  Information coding in the rodent prefrontal cortex. II. Ensemble activity in orbitofrontal cortex. , 1995, Journal of neurophysiology.

[97]  Y. Niv,et al.  Model-based predictions for dopamine , 2018, Current Opinion in Neurobiology.

[98]  M. Petrides The Orbitofrontal Cortex: Novelty, Deviation from Expectation, and Memory , 2007, Annals of the New York Academy of Sciences.

[99]  E. Rolls,et al.  Hunger and satiety modify the responses of olfactory and visual neurons in the primate orbitofrontal cortex. , 1996, Journal of neurophysiology.

[100]  M. Rushworth,et al.  Does the medial orbitofrontal cortex have a role in social valuation? , 2010, The European journal of neuroscience.

[101]  Jesper Andersson,et al.  A multi-modal parcellation of human cerebral cortex , 2016, Nature.

[102]  C. Cavada,et al.  The anatomical connections of the macaque monkey orbitofrontal cortex. A review. , 2000, Cerebral cortex.

[103]  M. Kringelbach The human orbitofrontal cortex: linking reward to hedonic experience , 2005, Nature Reviews Neuroscience.

[104]  M. Mishkin,et al.  Effects of orbital frontal and anterior cingulate lesions on object and spatial memory in rhesus monkeys , 1997, Neuropsychologia.

[105]  C. Delgado,et al.  Personality disorder related to an acute orbitofrontal lesion in multiple sclerosis. , 2011, The Journal of neuropsychiatry and clinical neurosciences.

[106]  E. Murray,et al.  Balkanizing the primate orbitofrontal cortex: distinct subregions for comparing and contrasting values , 2011, Annals of the New York Academy of Sciences.

[107]  Timothy Edward John Behrens,et al.  Separable Learning Systems in the Macaque Brain and the Role of Orbitofrontal Cortex in Contingent Learning , 2010, Neuron.

[108]  Thomas F Münte,et al.  Orbitofrontal Cortex Reactivity to Angry Facial Expression in a Social Interaction Correlates with Aggressive Behavior. , 2015, Cerebral cortex.