Learning in neural networks by reinforcement of irregular spiking.

Artificial neural networks are often trained by using the back propagation algorithm to compute the gradient of an objective function with respect to the synaptic strengths. For a biological neural network, such a gradient computation would be difficult to implement, because of the complex dynamics of intrinsic and synaptic conductances in neurons. Here we show that irregular spiking similar to that observed in biological neurons could be used as the basis for a learning rule that calculates a stochastic approximation to the gradient. The learning rule is derived based on a special class of model networks in which neurons fire spike trains with Poisson statistics. The learning is compatible with forms of synaptic dynamics such as short-term facilitation and depression. By correlating the fluctuations in irregular spiking with a reward signal, the learning rule performs stochastic gradient ascent on the expected reward. It is applied to two examples, learning the XOR computation and learning direction selectivity using depressing synapses. We also show in simulation that the learning rule is applicable to a network of noisy integrate-and-fire neurons.

[1]  P. Anandan,et al.  Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[3]  R. Zucker Short-term synaptic plasticity. , 1989 .

[4]  Michael I. Jordan,et al.  A more biologically plausible learning rule for neural networks. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Marwan A. Jabri,et al.  Weight Perturbation: An Optimal Architecture and Learning Technique for Analog VLSI Feedforward and Recurrent Multilayer Networks , 1991, Neural Comput..

[6]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[7]  Gert Cauwenberghs,et al.  A Fast Stochastic Error-Descent Algorithm for Supervised Learning and Optimization , 1992, NIPS.

[8]  William R. Softky,et al.  The highly irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[9]  M M Merzenich,et al.  Temporal information transformed into a spatial code by a neural network with realistic properties , 1995, Science.

[10]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[11]  H. Sompolinsky,et al.  Chaos in Neuronal Networks with Balanced Excitatory and Inhibitory Activity , 1996, Science.

[12]  V. Han,et al.  Synaptic plasticity in a cerebellum-like structure depends on temporal order , 1997, Nature.

[13]  L. Abbott,et al.  Synaptic Depression and Cortical Gain Control , 1997, Science.

[14]  H. Markram,et al.  Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997, Science.

[15]  Frances S. Chance,et al.  Synaptic Depression and the Temporal Response Characteristics of V1 Cells , 1998, The Journal of Neuroscience.

[16]  Henry Markram,et al.  Neural Networks with Dynamic Synapses , 1998, Neural Computation.

[17]  G. Bi,et al.  Synaptic Modifications in Cultured Hippocampal Neurons: Dependence on Spike Timing, Synaptic Strength, and Postsynaptic Cell Type , 1998, The Journal of Neuroscience.

[18]  Wolfgang Maass,et al.  Dynamic Stochastic Synapses as Computational Units , 1997, Neural Computation.

[19]  Nicolas Brunel,et al.  Fast Global Oscillations in Networks of Integrate-and-Fire Neurons with Low Firing Rates , 1999, Neural Computation.

[20]  S. Grossberg,et al.  Psychological Review , 2003 .

[21]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  J. van Loon Network , 2006 .