Reinforcement learning of a simple control task using the spike response model

Abstract In this work, we propose a variation of a direct reinforcement learning algorithm, suitable for usage with spiking neurons based on the spike response model (SRM). The SRM is a biologically inspired, flexible model of spiking neuron based on kernel functions that describe the effect of spike reception and emission on the membrane potential of the neuron. In our experiments, the spikes emitted by a SRM neuron are used as input signals in a simple control task. The reinforcement signal obtained from the environment is used by the direct reinforcement learning algorithm, that modifies the synaptic weights of the neuron, adjusting the spiking firing times in order to obtain a better performance at the given problem. The obtained results are comparable to those from classic methods based on value function approximation and temporal difference, for simple control tasks.

[1]  J. Simmons A view of the world through the bat's ear: The formation of acoustic images in echolocation , 1989, Cognition.

[2]  Máté Lengyel,et al.  Computing with spikes , 2006 .

[3]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[4]  P. Bartlett,et al.  Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent Algorithms and Experiments , 1999 .

[5]  Wulfram Gerstner,et al.  Spike-response model , 2008, Scholarpedia.

[6]  Guido Bugmann,et al.  A Spiking Neuron Model: Applications and Learning , 2002, Neural Networks.

[7]  Wulfram Gerstner,et al.  SPIKING NEURON MODELS Single Neurons , Populations , Plasticity , 2002 .

[8]  A. Roberts,et al.  Non‐linear summation of excitatory synaptic inputs to small neurones: a case study in spinal motoneurones of the young Xenopus tadpole , 1998, The Journal of physiology.

[9]  Xiaohui Xie,et al.  Learning in neural networks by reinforcement of irregular spiking. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Daniel D. Lee,et al.  Equilibrium properties of temporally asymmetric Hebbian plasticity. , 2000, Physical review letters.

[12]  Rainer Malaka,et al.  Solving nonlinear optimization problems using networks of spiking neurons , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[13]  Sander M. Bohte,et al.  Unsupervised clustering with spiking neurons by sparse temporal coding and multilayer RBF networks , 2002, IEEE Trans. Neural Networks.

[14]  A. Hodgkin,et al.  A quantitative description of membrane current and its application to conduction and excitation in nerve , 1952, The Journal of physiology.

[15]  W. Maass,et al.  Efficient temporal processing with biologically realistic dynamic synapses , 2001, Network.

[16]  William Bialek,et al.  Spikes: Exploring the Neural Code , 1996 .

[17]  Misha Mahowald,et al.  A Spike Based Learning Neuron in Analog VLSI , 1996, NIPS.

[18]  Wolfgang Maass,et al.  E?cient Temporal Processing with Biolog-ically Realistic Dynamic Synapses , 2001 .

[19]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[20]  Wulfram Gerstner,et al.  Mathematical formulations of Hebbian learning , 2002, Biological Cybernetics.

[21]  Wolfgang Maass,et al.  Dynamic Stochastic Synapses as Computational Units , 1997, Neural Computation.

[22]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[23]  P. Bartlett,et al.  Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms , 1999 .

[24]  T Natschläger,et al.  Spatial and temporal pattern analysis via spiking neurons. , 1998, Network.

[25]  Arnaud Delorme,et al.  Face identification using one spike per neuron: resistance to image degradations , 2001, Neural Networks.

[26]  Walter Heiligenberg,et al.  Temporal hyperacuity in the electric sense of fish , 1985, Nature.

[27]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[28]  H. Wilson Spikes, Decisions, and Actions: The Dynamical Foundations of Neuroscience , 1999 .