Reinforcement learning with a network of spiking agents

Neuroscientific theory suggests that dopaminergic neurons broadcast global reward prediction errors to large areas of the brain influencing the synaptic plasticity of the neurons in those regions. We build on this theory to propose a multi-agent learning framework with spiking neurons in the generalized linear model (GLM) formulation as agents, to solve reinforcement learning (RL) tasks. We show that a network of GLM spiking agents connected in a hierarchical fashion, where each spiking agent modulates its firing policy based on local information and a global prediction error, can learn complex action representations to solve RL tasks. We further show how leveraging principles of modularity and population coding inspired from the brain can help reduce variance in the learning updates making it a viable optimization technique.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  A. P. Georgopoulos,et al.  Neuronal population coding of movement direction. , 1986, Science.

[3]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[4]  Andrew G. Barto,et al.  Conjugate Markov Decision Processes , 2011, ICML.

[5]  Daniel Cownden,et al.  Random feedback weights support learning in deep neural networks , 2014, ArXiv.

[6]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7]  Wolfgang Maass,et al.  Noisy Spiking Neurons with Temporal Coding have more Computational Power than Sigmoidal Neurons , 1996, NIPS.

[8]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[9]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[10]  Razvan V. Florian,et al.  Correct equations for the dynamics of the cart-pole system , 2005 .

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Edward T. Bullmore,et al.  Modular and Hierarchically Modular Organization of Brain Networks , 2010, Front. Neurosci..

[13]  A. Dickinson,et al.  Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[14]  Eero P. Simoncelli,et al.  Spatio-temporal correlations and visual signalling in a complete neuronal population , 2008, Nature.

[15]  Philip S. Thomas,et al.  Policy Gradient Coagent Networks , 2011, NIPS.

[16]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..