The successor representation in human reinforcement learning

Theories of reward learning in neuroscience have focused on two families of algorithms thought to capture deliberative versus habitual choice. ‘Model-based’ algorithms compute the value of candidate actions from scratch, whereas ‘model-free’ algorithms make choice more efficient but less flexible by storing pre-computed action values. We examine an intermediate algorithmic family, the successor representation, which balances flexibility and efficiency by storing partially computed action values: predictions about future events. These pre-computation strategies differ in how they update their choices following changes in a task. The successor representation’s reliance on stored predictions about future states predicts a unique signature of insensitivity to changes in the task’s sequence of events, but flexible adjustment following changes to rewards. We provide evidence for such differential sensitivity in two behavioural studies with humans. These results suggest that the successor representation is a computational substrate for semi-flexible choice in humans, introducing a subtler, more cognitive notion of habit.Momennejad et al. formulate and provide evidence for the successor representation, a computational learning mechanism intermediate between the two dominant models (a fast but inflexible ‘model-free’ system and a flexible but slow ‘model-based’ one).

[1]  W. Brogden Sensory pre-conditioning. , 1939 .

[2]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[3]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[4]  J. O'Keefe,et al.  The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. , 1971, Brain research.

[5]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[6]  John R. Anderson,et al.  Reflections of the Environment in Memory Form of the Memory Functions , 2022 .

[7]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[8]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[9]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[10]  B. Balleine,et al.  Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[11]  H Eichenbaum,et al.  Neural Correlates of Olfactory Recognition Memory in the Rat Orbitofrontal Cortex , 2000, The Journal of Neuroscience.

[12]  Saori C. Tanaka,et al.  Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[13]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[14]  J. O'Doherty,et al.  The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans , 2006, The Journal of Neuroscience.

[15]  Peter Dayan,et al.  Hippocampal Contributions to Control: The Third Way , 2007, NIPS.

[16]  Correction: Temporal-Difference Reinforcement Learning with Distributed Representations , 2009, PLoS ONE.

[17]  Zeb Kurth-Nelson,et al.  Temporal-Difference Reinforcement Learning with Distributed Representations , 2009, PloS one.

[18]  Mattias P. Karlsson,et al.  Awake replay of remote experiences in the hippocampus , 2009, Nature Neuroscience.

[19]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[20]  Matthijs A. A. van der Meer,et al.  Hippocampal Replay Is Not a Simple Function of Experience , 2010, Neuron.

[21]  Raymond J. Dolan,et al.  Disentangling the Roles of Approach, Activation and Valence in Instrumental and Pavlovian Responding , 2011, PLoS Comput. Biol..

[22]  I. L. Nieuwenhuis,et al.  The role of the ventromedial prefrontal cortex in memory consolidation , 2011, Behavioural Brain Research.

[23]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[24]  Nathaniel D. Daw,et al.  Environmental statistics and the trade-off between model-based and TD learning in humans , 2011, NIPS.

[25]  D. R. Euston,et al.  The Role of Medial Prefrontal Cortex in Memory and Decision Making , 2012, Neuron.

[26]  Per B. Sederberg,et al.  The Successor Representation and Temporal Context , 2012, Neural Computation.

[27]  John-Dylan Haynes,et al.  Human anterior prefrontal cortex encodes the ‘what’ and ‘when’ of future intentions , 2012, NeuroImage.

[28]  P. Dayan,et al.  Mapping value based planning and extensively trained choice in the human brain , 2012, Nature Neuroscience.

[29]  David J. Foster,et al.  Sequence learning and the role of the hippocampus in rodent navigation , 2012, Current Opinion in Neurobiology.

[30]  P. Dayan Twenty-Five Lessons from Computational Neuromodulation , 2012, Neuron.

[31]  D. Shohamy,et al.  Preference by Association: How Memory Mechanisms in the Hippocampus Bias Decisions , 2012, Science.

[32]  Anne G E Collins,et al.  How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis , 2012, The European journal of neuroscience.

[33]  Brad E. Pfeiffer,et al.  Hippocampal place cell sequences depict future paths to remembered goals , 2013, Nature.

[34]  M. Botvinick,et al.  Neural representations of events arise from temporal community structure , 2013, Nature Neuroscience.

[35]  H. Eichenbaum,et al.  Interplay of Hippocampus and Prefrontal Cortex in Memory , 2013, Current Biology.

[36]  Timothy E. J. Behrens,et al.  Online evaluation of novel choices by simultaneous representation of multiple memories , 2013, Nature Neuroscience.

[37]  I. Momennejad,et al.  Encoding of Prospective Tasks in the Human Prefrontal Cortex under Varying Task Loads , 2013, The Journal of Neuroscience.

[38]  Ari Weinstein,et al.  Model-based hierarchical reinforcement learning and human action control , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[39]  David J Foster,et al.  Hippocampal Replay Captures the Unique Topological Structure of a Novel Environment , 2014, The Journal of Neuroscience.

[40]  Robert C. Wilson,et al.  Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.

[41]  P. Dayan,et al.  The algorithmic anatomy of model-based evaluation , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[42]  Shinsuke Shimojo,et al.  Neural Computations Underlying Arbitration between Model-Based and Model-free Learning , 2013, Neuron.

[43]  E. Maguire Memory consolidation in humans: new evidence and opportunities , 2014, Experimental physiology.

[44]  Barbara Landau,et al.  The Necessity of the Medial Temporal Lobe for Statistical Learning , 2014, Journal of Cognitive Neuroscience.

[45]  D. Hassabis,et al.  Hippocampal place cells construct reward related sequences through unexplored space , 2015, eLife.

[46]  N. Daw,et al.  Model-based learning protects against forming habits , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[47]  N. Daw,et al.  Integrating memories to guide decisions , 2015, Current Opinion in Behavioral Sciences.

[48]  Hugo J. Spiers,et al.  Solving the detour problem in navigation: a model of prefrontal and hippocampal interactions , 2015, Front. Hum. Neurosci..

[49]  M. Shapiro,et al.  A Map for Social Navigation in the Human Brain , 2015, Neuron.

[50]  Nan Jiang,et al.  The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.

[51]  N. Daw,et al.  Deciding How To Decide: Self-Control and Meta-Decision Making , 2015, Trends in Cognitive Sciences.

[52]  Samuel J. Gershman,et al.  Computational rationality: A converging paradigm for intelligence in brains, minds, and machines , 2015, Science.

[53]  N. Daw,et al.  Multiple memory systems as substrates for multiple decision systems , 2015, Neurobiology of Learning and Memory.

[54]  Valerie A. Carr,et al.  Prospective representation of navigational goals in the human hippocampus , 2016, Science.

[55]  Wouter Kool,et al.  When Does Model-Based Control Pay Off? , 2016, PLoS Comput. Biol..

[56]  M. Botvinick,et al.  Complementary learning systems within the hippocampus: A neural network modeling approach to reconciling episodic memory with statistical learning , 2016, bioRxiv.

[57]  Geoffrey Schoenbaum,et al.  Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework , 2016, eLife.

[58]  Andrew M. Wikenheiser,et al.  Over the river, through the woods: cognitive maps in the hippocampus and orbitofrontal cortex , 2016, Nature Reviews Neuroscience.

[59]  Jessica B. Hamrick,et al.  psiTurk: An open-source framework for conducting replicable behavioral experiments online , 2016, Behavior research methods.

[60]  D. Hassabis,et al.  Neural Mechanisms of Hierarchical Planning in a Virtual Subway Network , 2016, Neuron.

[61]  M. Botvinick,et al.  The hippocampus as a predictive map , 2016 .

[62]  Nicholas B. Turk-Browne,et al.  Complementary learning systems within the hippocampus: A neural network modeling approach to reconciling episodic memory with statistical learning , 2016, bioRxiv.

[63]  Raymond J Dolan,et al.  A map of abstract relational knowledge in the human hippocampal–entorhinal cortex , 2017, eLife.

[64]  Samuel Gershman,et al.  Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017, bioRxiv.

[65]  N. Daw,et al.  Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework , 2017, Annual review of psychology.