Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule

Learning agents, whether natural or artificial, must update their internal parameters in order to improve their behavior over time. In reinforcement learning, this plasticity is influenced by an environmental signal, termed a reward, that directs the changes in appropriate directions. We apply a recently introduced policy learning algorithm from machine learning to networks of spiking neurons and derive a spike-time-dependent plasticity rule that ensures convergence to a local optimum of the expected average reward. The approach is applicable to a broad class of neuronal models, including the Hodgkin-Huxley model. We demonstrate the effectiveness of the derived rule in several toy problems. Finally, through statistical analysis, we show that the synaptic plasticity rule established is closely related to the widely used BCM rule, for which good biological evidence exists.

[1]  E. Bienenstock,et al.  Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[2]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[3]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[6]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[7]  Rajesh P. N. Rao,et al.  Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning , 2001, Neural Computation.

[8]  勇一 作村,et al.  Biophysics of Computation , 2001 .

[9]  Wulfram Gerstner,et al.  Spiking Neuron Models , 2002 .

[10]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[11]  Wulfram Gerstner,et al.  Spiking Neuron Models: An Introduction , 2002 .

[12]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[13]  Eugene M. Izhikevich,et al.  Relating STDP to BCM , 2003, Neural Computation.

[14]  Xiaohui Xie,et al.  Learning in neural networks by reinforcement of irregular spiking. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Nathan Intrator,et al.  Theory of Cortical Plasticity , 2004 .

[16]  Wulfram Gerstner,et al.  Short-Term Synaptic Plasticity Orchestrates the Response of Pyramidal Cells and Interneurons to Population Bursts , 2005, Journal of Computational Neuroscience.

[17]  W. Gerstner,et al.  Generalized Bienenstock-Cooper-Munro rule for spiking neurons that maximizes information transmission. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Christopher K. I. Williams How to Pretend That Correlated Variables Are Independent by Using Difference Observations , 2005, Neural Computation.

[19]  Florentin Wörgötter,et al.  Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms , 2005, Neural Computation.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Markus Diesmann,et al.  A Spiking Neural Network Model of an Actor-Critic Learning Agent , 2009, Neural Computation.