Prospective Coding by Spiking Neurons

Animals learn to make predictions, such as associating the sound of a bell with upcoming feeding or predicting a movement that a motor command is eliciting. How predictions are realized on the neuronal level and what plasticity rule underlies their learning is not well understood. Here we propose a biologically plausible synaptic plasticity rule to learn predictions on a single neuron level on a timescale of seconds. The learning rule allows a spiking two-compartment neuron to match its current firing rate to its own expected future discounted firing rate. For instance, if an originally neutral event is repeatedly followed by an event that elevates the firing rate of a neuron, the originally neutral event will eventually also elevate the neuron’s firing rate. The plasticity rule is a form of spike timing dependent plasticity in which a presynaptic spike followed by a postsynaptic spike leads to potentiation. Even if the plasticity window has a width of 20 milliseconds, associations on the time scale of seconds can be learned. We illustrate prospective coding with three examples: learning to predict a time varying input, learning to predict the next stimulus in a delayed paired-associate task and learning with a recurrent network to reproduce a temporally compressed version of a sequence. We discuss the potential role of the learning mechanism in classical trace conditioning. In the special case that the signal to be predicted encodes reward, the neuron learns to predict the discounted future reward and learning is closely related to the temporal difference learning algorithm TD(λ).

[1]  T. Shors,et al.  Trace Conditioning and the Hippocampus: The Importance of Contiguity , 2006, The Journal of Neuroscience.

[2]  Markus Diesmann,et al.  A Spiking Neural Network Model of an Actor-Critic Learning Agent , 2009, Neural Computation.

[3]  W. Senn,et al.  Reinforcement learning in populations of spiking neurons , 2008, Nature Neuroscience.

[4]  Peter Dayan,et al.  The convergence of TD(λ) for general λ , 1992, Machine Learning.

[5]  L. Abbott,et al.  Extending the effects of spike-timing-dependent plasticity to behavioral timescales. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[6]  L. Abbott,et al.  Two layers of neural variability , 2012, Nature Neuroscience.

[7]  J. Fuster,et al.  From perception to action: temporal integrative functions of prefrontal and parietal neurons. , 1999, Cerebral cortex.

[8]  Minmin Luo,et al.  Dorsal Raphe Neurons Signal Reward through 5-HT and Glutamate , 2014, Neuron.

[9]  Zengcai V. Guo,et al.  A motor cortex circuit for motor planning and movement , 2015, Nature.

[10]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[11]  W. Senn,et al.  Matching Recall and Storage in Sequence Learning with Spiking Neural Networks , 2013, The Journal of Neuroscience.

[12]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[13]  Matthew A. Wilson,et al.  Hippocampal Replay of Extended Experience , 2009, Neuron.

[14]  Richard S. Sutton,et al.  A computational model of hippocampal function in trace conditioning , 2008, NIPS.

[15]  Marshall G. Hussain Shuler,et al.  A Simple Network Architecture Accounts for Diverse Reward Time Responses in Primary Visual Cortex , 2015, The Journal of Neuroscience.

[16]  Jonathan W. Pillow,et al.  Single-trial spike trains in parietal cortex reveal discrete steps during decision-making , 2015, Science.

[17]  H. Markram,et al.  Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997, Science.

[18]  Dean V. Buonomano,et al.  ROBUST TIMING AND MOTOR PATTERNS BY TAMING CHAOS IN RECURRENT NEURAL NETWORKS , 2012, Nature Neuroscience.

[19]  Y. Miyashita,et al.  Neural organization for the long-term memory of paired associates , 1991, Nature.

[20]  Paul B. Johnson,et al.  Premotor and parietal cortex: corticocortical connectivity and combinatorial computations. , 1997, Annual review of neuroscience.

[21]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[22]  G. Bi,et al.  Synaptic modification by correlated activity: Hebb's postulate revisited. , 2001, Annual review of neuroscience.

[23]  W. Senn,et al.  Climbing Neuronal Activity as an Event-Based Cortical Representation of Time , 2004, The Journal of Neuroscience.

[24]  A. Graybiel,et al.  Prolonged Dopamine Signalling in Striatum Signals Proximity and Value of Distant Rewards , 2013, Nature.

[25]  栁下 祥 A critical time window for dopamine actions on the structural plasticity of dendritic spines , 2016 .

[26]  Yonatan Loewenstein,et al.  Learning reward timing in cortex through reward dependent expression of synaptic plasticity , 2009, Proceedings of the National Academy of Sciences.

[27]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[28]  Matthijs A. A. van der Meer,et al.  Theta Phase Precession in Rat Ventral Striatum Links Place and Reward Information , 2011, The Journal of Neuroscience.

[29]  W. Gan,et al.  Branch-specific dendritic Ca2+ spikes cause persistent synaptic plasticity , 2015, Nature.

[30]  W. Gerstner,et al.  Optimal Control of Transient Dynamics in Balanced Networks Supports Generation of Complex Movements , 2014, Neuron.

[31]  W. Gerstner,et al.  Triplets of Spikes in a Model of Spike Timing-Dependent Plasticity , 2006, The Journal of Neuroscience.

[32]  Su Z. Hong,et al.  Distinct Eligibility Traces for LTP and LTD in Cortical Synapses , 2015, Neuron.

[33]  Jean-Pascal Pfister,et al.  Optimal Spike-Timing-Dependent Plasticity for Precise Action Potential Firing in Supervised Learning , 2005, Neural Computation.

[34]  Walter Senn,et al.  Spatio-Temporal Credit Assignment in Neuronal Population Learning , 2011, PLoS Comput. Biol..

[35]  E. Miller,et al.  Prospective Coding for Objects in Primate Prefrontal Cortex , 1999, The Journal of Neuroscience.

[36]  Wulfram Gerstner,et al.  Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons , 2013, PLoS Comput. Biol..

[37]  Wulfram Gerstner,et al.  Tag-Trigger-Consolidation: A Model of Early and Late Long-Term-Potentiation and Depression , 2008, PLoS Comput. Biol..

[38]  Minija Tamosiunaite,et al.  On the Asymptotic Equivalence Between Differential Hebbian and Temporal Difference Learning , 2008, Neural Computation.

[39]  I. Pavlov,et al.  Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex , 2010, Annals of Neurosciences.

[40]  R. Desimone,et al.  Neural Mechanisms of Visual Working Memory in Prefrontal Cortex of the Macaque , 1996, The Journal of Neuroscience.

[41]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[42]  Y. Dan,et al.  Activity Recall in Visual Cortical Ensemble , 2012, Nature Neuroscience.

[43]  W. Gerstner,et al.  Synaptic Consolidation: From Synapses to Behavioral Modeling , 2015, The Journal of Neuroscience.

[44]  G. Bi,et al.  Gain in sensitivity and loss in temporal contrast of STDP by dopaminergic modulation at hippocampal synapses , 2009, Proceedings of the National Academy of Sciences.

[45]  Andrew M. Clark,et al.  Stimulus onset quenches neural variability: a widespread cortical phenomenon , 2010, Nature Neuroscience.

[46]  Michael J. Berry,et al.  Predictive information in a sensory population , 2013, Proceedings of the National Academy of Sciences.

[47]  R. F. Thompson,et al.  Hippocampus and trace conditioning of the rabbit's classically conditioned nictitating membrane response. , 1986, Behavioral neuroscience.

[48]  W. Senn,et al.  Learning by the Dendritic Prediction of Somatic Spiking , 2014, Neuron.