Prefrontal cortex creates novel navigation sequences from hippocampal place-cell replay with spatial reward propagation

As rats learn to search for multiple sources of food or water in a complex environment, they generate increasingly efficient trajectories between reward sites, across multiple trials. This optimization capacity has been characterized in the Traveling Salesrat Problem (TSP) (de Jong et al (2011). Such spatial navigation capacity involves the replay of hippocampal place-cells during awake states, generating small sequences of spatially related place-cell activity that we call “snippets”. These snippets occur primarily during sharp-wave-ripple (SWR) events. Here we focus on the role of replay during the awake state, as the animal is learning across multiple trials. We hypothesize that snippet replay generates synthetic data that can substantially expand and restructure the experience available to make PFC learning more optimal. We developed a model of snippet generation that is modulated by reward, propagated in the forward and reverse directions. This implements a form of spatial credit assignment for reinforcement learning. We use a biologically motivated computational framework known as ‘reservoir computing’ to model PFC in sequence learning, in which large pools of prewired neural elements process information dynamically through reverberations. This PFC model is ideal to consolidate snippets into larger spatial sequences that may be later recalled by subsets of the original sequences. Our simulation experiments provide neurophysiological explanations for two pertinent observations related to navigation. Reward modulation allows the system to reject non-optimal segments of experienced trajectories, and reverse replay allows the system to “learn” trajectories that is has not physically experienced, both of which significantly contribute to the TSP behavior. Author Summary As rats search for multiple sources of food in a complex environment, they generate increasingly efficient trajectories between reward sites, across multiple trials, characterized in the Traveling Salesrat Problem (TSP). This likely involves the coordinated replay of place-cell “snippets” between successive trials. We hypothesize that “snippets” can be used by the prefrontal cortex (PFC) to implement a form of reward-modulated reinforcement learning. Our simulation experiments provide neurophysiological explanations for two pertinent observations related to navigation. Reward modulation allows the system to reject non-optimal segments of experienced trajectories, and reverse replay allows the system to “learn” trajectories that it has not physically experienced, both of which significantly contribute to the TSP behavior.

[1]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[2]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[3]  Matthijs A. A. van der Meer,et al.  Hippocampal Replay Is Not a Simple Function of Experience , 2010, Neuron.

[4]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[5]  Shantanu P. Jadhav,et al.  Multiple modes of hippocampal–prefrontal interactions in memory-guided behavior , 2016, Current Opinion in Neurobiology.

[6]  A. David Redish,et al.  Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model , 2005, Neural Networks.

[7]  Mattias P. Karlsson,et al.  Awake replay of remote experiences in the hippocampus , 2009, Nature Neuroscience.

[8]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[9]  Margaret F. Carr,et al.  Hippocampal replay in the awake state: a potential substrate for memory consolidation and retrieval , 2011, Nature Neuroscience.

[10]  Adam Johnson,et al.  Triple Dissociation of Information Processing in Dorsal Striatum, Ventral Striatum, and Hippocampus on a Learned Spatial Decision Task , 2010, Neuron.

[11]  Jean-Marc Fellous,et al.  A Role for the Longitudinal Axis of the Hippocampus in Multiscale Representations of Large and Complex Spatial Environments and Mnemonic Hierarchies , 2017, The Hippocampus - Plasticity and Functions.

[12]  L. Swanson,et al.  Spatial organization of direct hippocampal field CA1 axonal projections to the rest of the cerebral cortex , 2007, Brain Research Reviews.

[13]  Mantas Lukosevicius,et al.  A Practical Guide to Applying Echo State Networks , 2012, Neural Networks: Tricks of the Trade.

[14]  Matthew A. Wilson,et al.  Hippocampal Replay of Extended Experience , 2009, Neuron.

[15]  Peter Ford Dominey,et al.  Reservoir Computing Properties of Neural Dynamics in Prefrontal Cortex , 2016, PLoS Comput. Biol..

[16]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[17]  M. Wilson,et al.  Coordinated memory replay in the visual cortex and hippocampus during sleep , 2007, Nature Neuroscience.

[18]  L. Frank,et al.  Rewarded Outcomes Enhance Reactivation of Experience in the Hippocampus , 2009, Neuron.

[19]  Brad E. Pfeiffer,et al.  Reverse Replay of Hippocampal Place Cells Is Uniquely Modulated by Changing Reward , 2016, Neuron.

[20]  M. Witter,et al.  Projections from the parahippocampal region to the prefrontal cortex in the rat: evidence of multiple pathways , 2002, The European journal of neuroscience.

[21]  R. Vertes,et al.  Nucleus reuniens of the midline thalamus: Link between the medial prefrontal cortex and the hippocampus , 2007, Brain Research Bulletin.

[22]  Jean-Marc Fellous,et al.  Download details: IP Address: 128.196.98.99 , 2011 .

[23]  B. McNaughton,et al.  Reactivation of Hippocampal Cell Assemblies: Effects of Behavioral State, Experience, and EEG Dynamics , 1999, The Journal of Neuroscience.

[24]  Jan Bureš,et al.  Can rats solve a simple version of the traveling salesman problem? , 1992, Behavioural Brain Research.

[25]  Peter Ford Dominey Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning , 1995, Biological Cybernetics.

[26]  G. Buzsáki,et al.  Forward and reverse hippocampal place-cell sequences during ripples , 2007, Nature Neuroscience.

[27]  D. R. Euston,et al.  Fast-Forward Playback of Recent Memory Sequences in Prefrontal Cortex During Sleep , 2007, Science.