Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of NeoHebbian Three-Factor Learning Rules

Most elementary behaviors such as moving the arm to grasp an object or walking into the next room to explore a museum evolve on the time scale of seconds; in contrast, neuronal action potentials occur on the time scale of a few milliseconds. Learning rules of the brain must therefore bridge the gap between these two different time scales. Modern theories of synaptic plasticity have postulated that the co-activation of pre- and postsynaptic neurons sets a flag at the synapse, called an eligibility trace, that leads to a weight change only if an additional factor is present while the flag is set. This third factor, signaling reward, punishment, surprise, or novelty, could be implemented by the phasic activity of neuromodulators or specific neuronal inputs signaling special events. While the theoretical framework has been developed over the last decades, experimental evidence in support of eligibility traces on the time scale of seconds has been collected only during the last few years. Here we review, in the context of three-factor rules of synaptic plasticity, four key experiments that support the role of synaptic eligibility traces in combination with a third factor as a biological implementation of neoHebbian three-factor learning rules.

[1]  Peter L. Bartlett,et al.  Hebbian Synaptic Modifications in Spiking Neurons that Learn , 2019, ArXiv.

[2]  Jürgen Schmidhuber,et al.  Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts , 2005 .

[3]  Razvan V. Florian,et al.  Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.

[4]  S. Hochreiter,et al.  REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[5]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[6]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[7]  Katie C. Bittner,et al.  Behavioral time scale synaptic plasticity underlies CA1 place fields , 2017, Science.

[8]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[9]  Karl J. Friston,et al.  Neuroscience and Biobehavioral Reviews , 2022 .

[10]  Richard L. Huganir,et al.  AMPARs and Synaptic Plasticity: The Last 25 Years , 2013, Neuron.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Richard S. Sutton,et al.  Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[13]  Christian Tetzlaff,et al.  The formation of multi-synaptic connections by the interaction of synaptic and structural plasticity and their functional consequences , 2014, BMC Neuroscience.

[14]  Rosemary A. Reynolds Diversity and homogeneity , 2017 .

[15]  A. Dickinson,et al.  Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[16]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[17]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[18]  Timothy P Lillicrap,et al.  Towards deep learning with segregated dendrites , 2016, eLife.

[19]  H. Okouchi,et al.  Response acquisition by humans with delayed reinforcement. , 2009, Journal of the experimental analysis of behavior.

[20]  Eugene M. Izhikevich,et al.  Relating STDP to BCM , 2003, Neural Computation.

[21]  Kenji Doya,et al.  Temporal Difference Learning in Continuous Time and Space , 1995, NIPS.

[22]  N. Squires,et al.  The effect of stimulus sequence on the waveform of the cortical event-related potential. , 1976, Science.

[23]  Q. Gu,et al.  Neuromodulatory transmitter systems in the cortex and their role in cortical plasticity , 2002, Neuroscience.

[24]  H. C. LONGUET-HIGGINS,et al.  Non-Holographic Associative Memory , 1969, Nature.

[25]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[26]  J. Wickens,et al.  A silent eligibility trace enables dopamine‐dependent synaptic plasticity for reinforcement learning in the mouse striatum , 2018, The European journal of neuroscience.

[27]  J. Mink THE BASAL GANGLIA: FOCUSED SELECTION AND INHIBITION OF COMPETING MOTOR PROGRAMS , 1996, Progress in Neurobiology.

[28]  A G Barto,et al.  Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[29]  P. E. Sharp,et al.  Simulation of spatial learning in the Morris water maze by a neural network model of the hippocampal formation and nucleus accumbens , 1995, Hippocampus.

[30]  W. Schultz,et al.  Sequential neuromodulation of Hebbian plasticity offers mechanism for effective reward-based navigation , 2017, eLife.

[31]  Florent Meyniel,et al.  Human Inferences about Sequences: A Minimal Transition Probability Model , 2016, bioRxiv.

[32]  Karl J. Friston,et al.  Uncertainty in perception and the Hierarchical Gaussian Filter , 2014, Front. Hum. Neurosci..

[33]  T. Bliss,et al.  Long‐lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path , 1973, The Journal of physiology.

[34]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[35]  Robert C. Wilson,et al.  An Approximately Bayesian Delta-Rule Model Explains the Dynamics of Belief Updating in a Changing Environment , 2010, The Journal of Neuroscience.

[36]  Xiaohui Xie,et al.  Learning in neural networks by reinforcement of irregular spiking. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Wulfram Gerstner,et al.  Multicontact Co-operativity in Spike-Timing–Dependent Structural Plasticity Stabilizes Networks , 2016, Cerebral cortex.

[38]  G. Laurent,et al.  Conditional modulation of spike-timing-dependent plasticity for olfactory learning , 2012, Nature.

[39]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[40]  Pieter R. Roelfsema,et al.  Attention-Gated Reinforcement Learning of Internal Representations for Classification , 2005, Neural Computation.

[41]  W. Senn,et al.  Reinforcement learning in populations of spiking neurons , 2008, Nature Neuroscience.

[42]  Mark F Bear,et al.  Reward timing in the primary visual cortex. , 2006, Science.

[43]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[44]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[45]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[46]  T. Crow Cortical Synapses and Reinforcement: a Hypothesis , 1968, Nature.

[47]  Daniel D. Lee,et al.  Equilibrium properties of temporally asymmetric Hebbian plasticity. , 2000, Physical review letters.

[48]  W. Gerstner,et al.  Connectivity reflects coding: A model of voltage-based spike-timing-dependent-plasticity with homeostasis , 2009 .

[49]  Wulfram Gerstner,et al.  A neuronal learning rule for sub-millisecond temporal coding , 1996, Nature.

[50]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[51]  E. Bienenstock,et al.  Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[52]  K. Doya Complementary roles of basal ganglia and cerebellum in learning and motor control , 2000, Current Opinion in Neurobiology.

[53]  H. Markram,et al.  Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997, Science.

[54]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[55]  David J. Foster,et al.  A model of hippocampally dependent navigation, using the temporal difference learning rule , 2000, Hippocampus.

[56]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[57]  J. Pfister,et al.  A triplet spike-timing–dependent plasticity model generalizes the Bienenstock–Cooper–Munro rule to higher-order spatiotemporal correlations , 2011, Proceedings of the National Academy of Sciences.

[58]  S. Haber,et al.  Reward-Related Cortical Inputs Define a Large Striatal Region in Primates That Interface with Associative Cortical Connections, Providing a Substrate for Incentive-Based Learning , 2006, The Journal of Neuroscience.

[59]  D. Moncada,et al.  Induction of Long-Term Memory by Exposure to Novelty Requires Protein Synthesis: Evidence for a Behavioral Tagging , 2007, The Journal of Neuroscience.

[60]  Pieter R. Roelfsema,et al.  How Attention Can Create Synaptic Tags for the Learning of Working Memories in Sequential Tasks , 2015, PLoS Comput. Biol..

[61]  E. Fischer Conditioned Reflexes , 1942, American journal of physical medicine.

[62]  S. J. Martin,et al.  Synaptic plasticity and memory: an evaluation of the hypothesis. , 2000, Annual review of neuroscience.

[63]  U. Frey,et al.  Synaptic tagging and long-term potentiation , 1997, Nature.

[64]  P. J. Sjöström,et al.  Rate, Timing, and Cooperativity Jointly Determine Cortical Synaptic Plasticity , 2001, Neuron.

[65]  W. Gerstner,et al.  Connectivity reflects coding: a model of voltage-based STDP with homeostasis , 2010, Nature Neuroscience.

[66]  Tim Fingscheidt,et al.  A computational analysis of the neural bases of Bayesian inference , 2015, NeuroImage.

[67]  M. Tsodyks,et al.  Synaptic Theory of Working Memory , 2008, Science.

[68]  Mark C. W. van Rossum,et al.  State Based Model of Long-Term Potentiation and Synaptic Tagging and Capture , 2009, PLoS Comput. Biol..

[69]  Georg B. Keller,et al.  Learning Enhances Sensory and Multiple Non-sensory Representations in Primary Visual Cortex , 2015, Neuron.

[70]  Nicolas Brunel,et al.  STDP in a Bistable Synapse Model Based on CaMKII and Associated Signaling Pathways , 2007, PLoS Comput. Biol..

[71]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[72]  Henning Sprekeler,et al.  Functional Requirements for Reward-Modulated Spike-Timing-Dependent Plasticity , 2010, The Journal of Neuroscience.

[73]  L. Abbott,et al.  Competitive Hebbian learning through spike-timing-dependent synaptic plasticity , 2000, Nature Neuroscience.

[74]  Carson C. Chow,et al.  Calcium time course as a signal for spike-timing-dependent plasticity. , 2005, Journal of neurophysiology.

[75]  R. Kempter,et al.  Hebbian learning and spiking neurons , 1999 .

[76]  W. Senn,et al.  Matching Recall and Storage in Sequence Learning with Spiking Neural Networks , 2013, The Journal of Neuroscience.

[77]  Danielle M. Santarelli The developing brain. , 1969, Nature.

[78]  P. Strick,et al.  Basal ganglia and cerebellar loops: motor and cognitive circuits , 2000, Brain Research Reviews.

[79]  G. Orban,et al.  Practising orientation identification improves orientation coding in V1 neurons , 2001, Nature.

[80]  Y. Dan,et al.  Spike-timing-dependent synaptic modification induced by natural spike trains , 2002, Nature.

[81]  Florentin Wörgötter,et al.  The Formation of Multi-synaptic Connections by the Interaction of Synaptic and Structural Plasticity and Their Functional Consequences , 2014, BMC Neuroscience.

[82]  W. Gerstner,et al.  Synaptic Consolidation: From Synapses to Behavioral Modeling , 2015, The Journal of Neuroscience.

[83]  Friedrich T. Sommer,et al.  Learning and exploration in action-perception loops , 2013, Front. Neural Circuits.

[84]  E. Hess,et al.  Pupil Size as Related to Interest Value of Visual Stimuli , 1960, Science.

[85]  A. Artola,et al.  Synaptic Activity Modulates the Induction of Bidirectional Synaptic Changes in Adult Mouse Hippocampus , 2000, The Journal of Neuroscience.

[86]  A. Cooper,et al.  Predictive Reward Signal of Dopamine Neurons , 2011 .

[87]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[88]  W. Singer,et al.  Long-term depression of excitatory synaptic transmission and its relationship to long-term potentiation , 1993, Trends in Neurosciences.

[89]  Angelo Arleo,et al.  Spatial cognition and neuro-mimetic navigation: a model of hippocampal place cell activity , 2000, Biological Cybernetics.

[90]  J. Lisman Long-term potentiation: outstanding questions and attempted synthesis. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[91]  Yonatan Loewenstein,et al.  Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity , 2006, Proceedings of the National Academy of Sciences.

[92]  Kerstin Preuschoff,et al.  Balancing New against Old Information: The Role of Puzzlement Surprise in Learning , 2018, Neural Computation.

[93]  J. Lisman A mechanism for memory storage insensitive to molecular turnover: a bistable autophosphorylating kinase. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[94]  J. Frey,et al.  The late maintenance of hippocampal LTP: Requirements, phases, ‘synaptic tagging’, ‘late-associativity’ and implications , 2007, Neuropharmacology.

[95]  Karl J. Friston,et al.  A Bayesian Foundation for Individual Learning Under Uncertainty , 2011, Front. Hum. Neurosci..

[96]  W. Singer,et al.  Selection of intrinsic horizontal connections in the visual cortex by correlated neuronal activity. , 1992, Science.

[97]  N. Uchida,et al.  Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice , 2016, eLife.

[98]  Moritz Helias,et al.  Structural Plasticity Controlled by Calcium Based Correlation Detection , 2008, Frontiers Comput. Neurosci..

[99]  Henry Kennedy,et al.  Brain structure and dynamics across scales: in search of rules , 2016, Current Opinion in Neurobiology.

[100]  R. Morris,et al.  Making memories last: the synaptic tagging and capture hypothesis , 2010, Nature Reviews Neuroscience.

[101]  John N. J. Reynolds,et al.  Dopamine-dependent plasticity of corticostriatal synapses , 2002, Neural Networks.

[102]  F. Crick Neurobiology: Memory and molecular turnover , 1984, Nature.

[103]  L. Cooper,et al.  A unified model of NMDA receptor-dependent bidirectional synaptic plasticity , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[104]  A. Hodgkin,et al.  A quantitative description of membrane current and its application to conduction and excitation in nerve , 1952, The Journal of physiology.

[105]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[106]  Jean-Pascal Pfister,et al.  Optimal Spike-Timing-Dependent Plasticity for Precise Action Potential Firing in Supervised Learning , 2005, Neural Computation.

[107]  Emrah Duzel,et al.  A neoHebbian framework for episodic memory; role of dopamine-dependent late LTP , 2011, Trends in Neurosciences.

[108]  W. Levy,et al.  Temporal contiguity requirements for long-term associative potentiation/depression in the hippocampus , 1983, Neuroscience.

[109]  A. H. Klopf,et al.  Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .

[110]  Moritz Helias,et al.  Spike-Timing Dependence of Structural Plasticity Explains Cooperative Synapse Formation in the Neocortex , 2012, PLoS Comput. Biol..

[111]  Wulfram Gerstner,et al.  Spike-timing dependent plasticity , 2010, Scholarpedia.

[112]  L. F. Abbott,et al.  Generating Coherent Patterns of Activity from Chaotic Neural Networks , 2009, Neuron.

[113]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[114]  Colin J. Akerman,et al.  Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[115]  Wulfram Gerstner,et al.  Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons , 2013, PLoS Comput. Biol..

[116]  Wulfram Gerstner,et al.  Tag-Trigger-Consolidation: A Model of Early and Late Long-Term-Potentiation and Depression , 2008, PLoS Comput. Biol..

[117]  Henry Markram,et al.  An Algorithm for Modifying Neurotransmitter Release Probability Based on Pre- and Postsynaptic Spike Timing , 2001, Neural Computation.

[118]  Mriganka Sur,et al.  Structural and Molecular Remodeling of Dendritic Spine Substructures during Long-Term Potentiation , 2014, Neuron.

[119]  I. Pavlov,et al.  Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex , 2010, Annals of Neurosciences.

[120]  Walter Senn,et al.  Learning Real-World Stimuli in a Neural Network with Spike-Driven Synaptic Dynamics , 2007, Neural Computation.

[121]  B. Averbeck,et al.  Action Selection and Action Value in Frontal-Striatal Circuits , 2012, Neuron.

[122]  Pieter R. Roelfsema,et al.  Control of synaptic plasticity in deep cortical networks , 2018, Nature Reviews Neuroscience.

[123]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[124]  A E Stark Hardy-Weinberg law: asymptotic approach to a generalized form. , 1976, Science.

[125]  W. Schultz,et al.  Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.

[126]  W. Maass,et al.  State-dependent computations: spatiotemporal processing in cortical networks , 2009, Nature Reviews Neuroscience.

[127]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[128]  Johanni Brea,et al.  Prospective Coding by Spiking Neurons , 2016, PLoS Comput. Biol..

[129]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[130]  Mark C. W. van Rossum,et al.  Stable Hebbian Learning from Spike Timing-Dependent Plasticity , 2000, The Journal of Neuroscience.

[131]  J. Lisman,et al.  A mechanism for the Hebb and the anti-Hebb processes underlying learning and memory. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[132]  L. Abbott,et al.  Cascade Models of Synaptically Stored Memories , 2005, Neuron.

[133]  Stefano Fusi,et al.  Computational principles of synaptic memory consolidation , 2016, Nature Neuroscience.

[134]  D. Johnston,et al.  Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997 .

[135]  Larry Stein,et al.  Reinforcement delay of one second severely impairs acquisition of brain self-stimulation , 1985, Brain Research.

[136]  Yi Sun,et al.  Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.

[137]  M. Hasselmo The role of acetylcholine in learning and memory , 2006, Current Opinion in Neurobiology.

[138]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[139]  Su Z. Hong,et al.  Distinct Eligibility Traces for LTP and LTD in Cortical Synapses , 2015, Neuron.

[140]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[141]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[142]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[143]  Robert A. Legenstein,et al.  A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback , 2008, PLoS Comput. Biol..

[144]  Minryung R. Song,et al.  Diversity and Homogeneity in Responses of Midbrain Dopamine Neurons , 2013, The Journal of Neuroscience.

[145]  T. Bliss,et al.  A synaptic model of memory: long-term potentiation in the hippocampus , 1993, Nature.

[146]  John Lisman,et al.  Glutamatergic synapses are structurally and biochemically complex because of multiple plasticity processes: long-term potentiation, long-term depression, short-term potentiation and scaling , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[147]  W. Senn,et al.  Learning by the Dendritic Prediction of Somatic Spiking , 2014, Neuron.

[148]  Wulfram Gerstner,et al.  Stochastic variational learning in recurrent spiking networks , 2014, Front. Comput. Neurosci..

[149]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[150]  J. Stevens,et al.  Animal Intelligence , 1883, Nature.

[151]  W. Schultz,et al.  Retroactive modulation of spike timing-dependent plasticity by dopamine , 2015, eLife.

[152]  Wulfram Gerstner,et al.  Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail , 2009, PLoS Comput. Biol..

[153]  G. Bi,et al.  Gain in sensitivity and loss in temporal contrast of STDP by dopaminergic modulation at hippocampal synapses , 2009, Proceedings of the National Academy of Sciences.

[154]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[155]  E. Kandel,et al.  Is Heterosynaptic modulation essential for stabilizing hebbian plasiticity and memory , 2000, Nature Reviews Neuroscience.

[156]  J. Roeper Dissecting the diversity of midbrain dopamine neurons , 2013, Trends in Neurosciences.

[157]  Li I. Zhang,et al.  A critical window for cooperation and competition among developing retinotectal synapses , 1998, Nature.

[158]  Tim Fingscheidt,et al.  A Model-Based Approach to Trial-By-Trial P300 Amplitude Fluctuations , 2013, Front. Hum. Neurosci..

[159]  R. Morris,et al.  Locus coeruleus and dopaminergic consolidation of everyday memory , 2016, Nature.

[160]  Takeo Watanabe,et al.  Perceptual learning rules based on reinforcers and attention , 2010, Trends in Cognitive Sciences.

[161]  Susumu Tonegawa,et al.  Conjunctive input processing drives feature selectivity in hippocampal CA1 neurons , 2015, Nature Neuroscience.

[162]  W. Gerstner,et al.  Neuromodulated Spike-Timing-Dependent Plasticity, and Theory of Three-Factor Learning Rules , 2016, Front. Neural Circuits.

[163]  Kenneth D. Miller,et al.  The Role of Constraints in Hebbian Learning , 1994, Neural Computation.

[164]  J. Wickens,et al.  Timing is not Everything: Neuromodulation Opens the STDP Gate , 2010, Front. Syn. Neurosci..

[165]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[166]  栁下 祥 A critical time window for dopamine actions on the structural plasticity of dendritic spines , 2016 .

[167]  Yasushi Miyashita,et al.  Dendritic spine geometry is critical for AMPA receptor expression in hippocampal CA1 pyramidal neurons , 2001, Nature Neuroscience.