Learning to use working memory: a reinforcement learning gating model of rule acquisition in rats

Learning to form appropriate, task-relevant working memory representations is a complex process central to cognition. Gating models frame working memory as a collection of past observations and use reinforcement learning (RL) to solve the problem of when to update these observations. Investigation of how gating models relate to brain and behavior remains, however, at an early stage. The current study sought to explore the ability of simple RL gating models to replicate rule learning behavior in rats. Rats were trained in a maze-based spatial learning task that required animals to make trial-by-trial choices contingent upon their previous experience. Using an abstract version of this task, we tested the ability of two gating algorithms, one based on the Actor-Critic and the other on the State-Action-Reward-State-Action (SARSA) algorithm, to generate behavior consistent with the rats'. Both models produced rule-acquisition behavior consistent with the experimental data, though only the SARSA gating model mirrored faster learning following rule reversal. We also found that both gating models learned multiple strategies in solving the initial task, a property which highlights the multi-agent nature of such models and which is of importance in considering the neural basis of individual differences in behavior.

[1]  G. Kimble,et al.  One-trial discrimination reversal in the white rat. , 1954, Journal of comparative and physiological psychology.

[2]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[3]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[5]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[6]  Rafal Bogacz,et al.  Integration of Reinforcement Learning and Optimal Decision-Making Theories of the Basal Ganglia , 2011, Neural Computation.

[7]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[9]  Christopher D. Harvey,et al.  Choice-specific sequences in parietal cortex during a virtual-navigation decision task , 2012, Nature.

[10]  Robert Lalonde,et al.  The neurobiological basis of spontaneous alternation , 2002, Neuroscience & Biobehavioral Reviews.

[11]  Rafal Bogacz,et al.  Parameterization of connectionist models , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[12]  Peter Dayan,et al.  Simple Substrates for Complex Cognition , 2008, Front. Neurosci..

[13]  M. Shapiro,et al.  Prospective and Retrospective Memory Coding in the Hippocampus , 2003, Neuron.

[14]  Jonathan D. Cohen,et al.  Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement , 2008, NIPS.

[15]  J. Aggleton,et al.  How rats perform spatial working memory tasks: Limitations in the use of egocentric and idiothetic working memory , 2006, Quarterly journal of experimental psychology.

[16]  A. Baddeley Working memory: theories, models, and controversies. , 2012, Annual review of psychology.

[17]  Jonathan D. Cohen,et al.  Prefrontal cortex and flexible cognitive control: rules without symbols. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Jonathan D. Cohen,et al.  On the Control of Control: The Role of Dopamine in Regulating Prefrontal Function and Working Memory , 2007 .

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  F. Restle Discrimination of cues in mazes: a resolution of the place-vs.-response question. , 1957, Psychological review.

[21]  J. Muir,et al.  On the transience of egocentric working memory: evidence from testing the contribution of limbic brain regions. , 2004, Behavioral neuroscience.

[22]  J. Cohen,et al.  Dopamine, cognitive control, and schizophrenia: the gating model. , 1999, Progress in brain research.

[23]  X. Wang,et al.  Synaptic Basis of Cortical Persistent Activity: the Importance of NMDA Receptors to Working Memory , 1999, The Journal of Neuroscience.

[24]  M. Wilson,et al.  Theta Rhythms Coordinate Hippocampal–Prefrontal Interactions in a Spatial Memory Task , 2005, PLoS biology.

[25]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[26]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[27]  Peter Dayan,et al.  Bilinearity, Rules, and Prefrontal Cortex , 2007, Frontiers Comput. Neurosci..

[28]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[29]  Eric A. Zilli,et al.  Modeling the role of working memory and episodic memory in behavioral tasks , 2008, Hippocampus.