Memory mechanisms predict sampling biases in sequential decision tasks

Good decisions are informed by past experience. Accordingly, models of memory encoding and retrieval can shed light on the evaluation processes underlying choice. In one classic memory model aimed at explaining biases in free recall, known as the temporal context model (TCM), a drifting temporal context serves as a cue for retrieving previously encoded items. The associations built by this model share a number of similarities to the successor representation (SR) — a particular type of world model used in reinforcement learning to capture the long-run consequences of actions. Here, we show how decision variables may be constructed by retrieval in the TCM, corresponding to drawing samples from the SR. Since the SR and TCM encode long-term sequential relationships, this provides a mechanistic, process level model for evaluating candidate actions in sequential, multi-step tasks, connecting them to the details of memory encoding and retrieval. This framework reveals three ways in which the phenomenology of memory predict novel choice biases that are counterintuitive from a decision perspective: the effects of emotion, of sequential retrieval, and of backward reactivation. The suggestion that the brain employs an efficient sampling algorithm to rapidly compute decision variables offers a normative view on decision biases, explains patterns of memory retrieval during deliberation, and may shed light on psychiatric disorders such as rumination and craving.

[1]  Per B. Sederberg,et al.  The Successor Representation and Temporal Context , 2012, Neural Computation.

[2]  Deborah Talmi,et al.  A retrieved context model of the emotional modulation of memory. , 2019, Psychological review.

[3]  Samuel Gershman,et al.  Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017, bioRxiv.

[4]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[5]  Mel W. Khaw,et al.  Reminders of past choices bias decisions for reward in humans , 2017, Nature Communications.

[6]  Peter Dayan,et al.  Interplay of approximate planning strategies , 2015, Proceedings of the National Academy of Sciences.

[7]  Stefanie Tellex,et al.  Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning , 2017, ArXiv.

[8]  Ori Plonsky,et al.  Reliance on small samples, the wavy recency effect, and similarity-based learning. , 2015, Psychological review.

[9]  Marc W. Howard,et al.  A distributed representation of temporal context , 2002 .

[10]  F. Cushman,et al.  Habitual control of goal selection in humans , 2015, Proceedings of the National Academy of Sciences.

[11]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[12]  Falk Lieder,et al.  Overrepresentation of Extreme Events in Decision Making Reflects Rational Use of Cognitive Resources , 2017, Psychological review.

[13]  N. Daw,et al.  Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework , 2017, Annual review of psychology.

[14]  M. Botvinick,et al.  The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.

[15]  M. Botvinick,et al.  The hippocampus as a predictive map , 2016 .

[16]  T. Schonberg,et al.  The Cue-Approach Task as a General Mechanism for Long-Term Non-Reinforced Behavioral Change , 2018, Scientific Reports.