Forgetful inference in a sophisticated world model

Humans and other animals are able to discover underlying statistical structure in their environments and exploit it to achieve efficient and effective performance. However, such structure is often difficult to learn and use because it is obscure, involving long-range temporal dependencies. Here, we analysed behavioural data from an extended experiment with rats, showing that the subjects learned the underlying statistical structure, albeit suffering at times from immediate inferential imperfections as to their current state within it. We accounted for their behaviour using a Hidden Markov Model, in which recent observations are integrated with the recollections of an imperfect memory. We found that over the course of training, subjects came to track their progress through the task more accurately, a change that our model largely attributed to decreased forgetting. This ‘learning to remember’ decreased reliance on recent observations, which may be misleading, in favour of a longer-term memory. Author summary Humans and other animals possess the remarkable ability to find and exploit patterns and structures in their experience of a complex and varied world. However, such structures are often temporally extended and latent or hidden, being only partially correlated with immediate observations of the world. This makes it essential to integrate current and historical information, and creates a challenging statistical and computational problem. Here, we examine the behaviour of rats facing a version of this challenge posed by a brain-stimulation reward task. We find that subjects learned the general structure of the task, but struggled when immediate observations were misleading. We captured this behaviour with a model in which subjects integrated evidence from their observations together with a memory whose imperfections accounted for their errors. The subjects’ performance improved markedly over successive sessions, allowing them to overcome misleading observations. According to the model, this arose from a process of ‘learning to remember’ in which subjects became better at employing more reliable past observations to determine the hidden state of the world.

[1]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[2]  James L Olds,et al.  Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. , 1954, Journal of comparative and physiological psychology.

[3]  P. J. Foley The foreperiod and simple reaction time. , 1959, Canadian journal of psychology.

[4]  R. Näätänen,et al.  Foreperiod and simple reaction time. , 1981 .

[5]  Y. Miyashita Neuronal correlate of visual associative long-term memory in the primate temporal cortex , 1988, Nature.

[6]  C. Hölscher,et al.  Quinolinic acid lesion of the rat entorhinal cortex pars medialis produces selective amnesia in allocentric working memory (WM), but not in egocentric WM , 1994, Behavioural Brain Research.

[7]  J. Fuster Network memory , 1997, Trends in Neurosciences.

[8]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[9]  Jeffrey N. Rouder,et al.  Modeling Response Times for Two-Choice Decisions , 1998 .

[10]  B. Richmond,et al.  Learning motivational significance of visual cues for reward schedules requires rhinal cortex , 2000, Nature Neuroscience.

[11]  Michael J. Frank,et al.  Interactions between frontal cortex and basal ganglia in working memory: A computational model , 2001, Cognitive, affective & behavioral neuroscience.

[12]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[13]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[14]  M. Hasselmo,et al.  Graded persistent activity in entorhinal cortex neurons , 2002, Nature.

[15]  J. Gold,et al.  Banburismus and the Brain Decoding the Relationship between Sensory Stimuli, Decisions, and Reward , 2002, Neuron.

[16]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[17]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[18]  Peter Dayan,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[19]  Gong-Wu Wang,et al.  Disconnection of the hippocampal–prefrontal cortical circuits impairs spatial working memory performance in rats , 2006, Behavioural Brain Research.

[20]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[21]  Jonathan D. Cohen,et al.  Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement , 2008, NIPS.

[22]  M. Jung,et al.  Prefrontal cortex and hippocampus subserve different components of working memory in rats. , 2008, Learning & memory.

[23]  Eric A. Zilli,et al.  The Influence of Markov Decision Process Structure on the Possible Strategic Use of Working Memory and Episodic Memory , 2008, PloS one.

[24]  P. Shizgal,et al.  Rattus Psychologicus: Construction of preferences by self-stimulating rats , 2009, Behavioural Brain Research.

[25]  M. Laubach,et al.  The role of rat dorsomedial prefrontal cortex in spatial working memory , 2009, Neuroscience.

[26]  D. Blei,et al.  Context, learning, and extinction. , 2010, Psychological review.

[27]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[28]  Angela L. Duckworth,et al.  An opportunity cost model of subjective effort and task performance. , 2013, The Behavioral and brain sciences.

[29]  Validation and extension of the reward-mountain model , 2013, Front. Behav. Neurosci..

[30]  Yannick-André Breton Molar and Molecular Models of Performance for Rewarding Brain Stimulation , 2013 .

[31]  Peter Dayan,et al.  Optimal indolence: a normative microscopic approach to work and leisure , 2014, Journal of The Royal Society Interface.

[32]  Robert C. Wilson,et al.  Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.

[33]  Bao-Ming Li,et al.  Neuronal representation of working memory in the medial prefrontal cortex of rats , 2014, Molecular Brain.

[34]  Peter Dayan,et al.  Some Work and Some Play: Microscopic and Macroscopic Approaches to Labor and Leisure , 2014, PLoS Comput. Biol..

[35]  P. Shizgal,et al.  Psychophysical inference of frequency-following fidelity in the neural substrate for brain stimulation reward , 2015, Behavioural Brain Research.

[36]  Y. Niv,et al.  Discovering latent causes in reinforcement learning , 2015, Current Opinion in Behavioral Sciences.

[37]  M. Botvinick,et al.  Motivation and cognitive control: from behavior to neural mechanism. , 2015, Annual review of psychology.

[38]  Yael Niv,et al.  A Probability Distribution over Latent Causes, in the Orbitofrontal Cortex , 2016, The Journal of Neuroscience.

[39]  Nicolas W. Schuck,et al.  Human Orbitofrontal Cortex Represents a Cognitive Map of State Space , 2016, Neuron.

[40]  Timothy E. J. Behrens,et al.  Organizing conceptual knowledge in humans with a gridlike code , 2016, Science.

[41]  Raymond J Dolan,et al.  A map of abstract relational knowledge in the human hippocampal–entorhinal cortex , 2017, eLife.

[42]  Peter Shizgal,et al.  Valuation of opportunity costs by rats working for rewarding electrical brain stimulation , 2017, PloS one.