Recurrent Spiking Networks Solve Planning Tasks

A recurrent spiking neural network is proposed that implements planning as probabilistic inference for finite and infinite horizon tasks. The architecture splits this problem into two parts: The stochastic transient firing of the network embodies the dynamics of the planning task. With appropriate injected input this dynamics is shaped to generate high-reward state trajectories. A general class of reward-modulated plasticity rules for these afferent synapses is presented. The updates optimize the likelihood of getting a reward through a variant of an Expectation Maximization algorithm and learning is guaranteed to convergence to a local maximum. We find that the network dynamics are qualitatively similar to transient firing patterns during planning and foraging in the hippocampus of awake behaving rats. The model extends classical attractor models and provides a testable prediction on identifying modulating contextual information. In a real robot arm reaching and obstacle avoidance task the ability to represent multiple task solutions is investigated. The neural planning method with its local update rules provides the basis for future neuromorphic hardware implementations with promising potentials like large data processing abilities and early initiation of strategies to avoid dangerous situations in robot co-worker scenarios.

[1]  Wulfram Gerstner,et al.  Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons , 2013, PLoS Comput. Biol..

[2]  Robert A. Legenstein,et al.  Ensembles of Spiking Neurons with Noise Support Optimal Probabilistic Inference in a Dynamically Changing Environment , 2014, PLoS Comput. Biol..

[3]  Rajesh P. N. Rao Hierarchical Bayesian Inference in Networks of Spiking Neurons , 2004, NIPS.

[4]  A. Redish Beyond the Cognitive Map: From Place Cells to Episodic Memory , 1999 .

[5]  T. Serrano-Gotarredona,et al.  STDP and STDP variations with memristors for spiking neuromorphic learning systems , 2013, Front. Neurosci..

[6]  Stan C. A. M. Gielen,et al.  Neural Network Dynamics for Path Planning and Obstacle Avoidance , 1995, Neural Networks.

[7]  B L McNaughton,et al.  Path Integration and Cognitive Mapping in a Continuous Attractor Neural Network Model , 1997, The Journal of Neuroscience.

[8]  Liang She,et al.  Priming with real motion biases visual cortical response to bistable apparent motion , 2012, Proceedings of the National Academy of Sciences.

[9]  Uğur M Erdem,et al.  A goal‐directed spatial navigation model using forward trajectory planning based on grid cells , 2012, The European journal of neuroscience.

[10]  Bruce L. McNaughton,et al.  Path integration and the neural basis of the 'cognitive map' , 2006, Nature Reviews Neuroscience.

[11]  Robert Legenstein,et al.  A compound memristive synapse model for statistical learning through STDP in spiking neural networks , 2014, Front. Neurosci..

[12]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[13]  Alec Solway,et al.  Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. , 2012, Psychological review.

[14]  A. P. Georgopoulos,et al.  Neuronal population coding of movement direction. , 1986, Science.

[15]  M. Botvinick,et al.  Planning as inference , 2012, Trends in Cognitive Sciences.

[16]  Eugene M. Izhikevich,et al.  Which model to use for cortical spiking neurons? , 2004, IEEE Transactions on Neural Networks.

[17]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[18]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[19]  Sophie Denève,et al.  Spike-Based Population Coding and Working Memory , 2011, PLoS Comput. Biol..

[20]  Jean-Pascal Pfister,et al.  Sequence learning with hidden units in spiking neural networks , 2011, NIPS.

[21]  Paul Miller,et al.  Natural stimuli evoke dynamic sequences of states in sensory cortical ensembles , 2007, Proceedings of the National Academy of Sciences.

[22]  Helge J. Ritter,et al.  The dynamic wave expansion neural network model for robot motion planning in time-varying environments , 2005, Neural Networks.

[23]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[24]  G. Buzsáki,et al.  Sequential structure of neocortical spontaneous activity in vivo , 2007, Proceedings of the National Academy of Sciences.

[25]  Farnood Merrikh-Bayat,et al.  Training and operation of an integrated neuromorphic network based on metal-oxide memristors , 2014, Nature.

[26]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[27]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[28]  Laurenz Wiskott,et al.  A computational model for preplay in the hippocampus , 2013, Front. Comput. Neurosci..

[29]  Margaret F. Carr,et al.  Hippocampal replay in the awake state: a potential substrate for memory consolidation and retrieval , 2011, Nature Neuroscience.

[30]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Sophie Denève,et al.  Bayesian Spiking Neurons I: Inference , 2008, Neural Computation.

[32]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[33]  L. Nadel,et al.  The Hippocampus as a Cognitive Map , 1978 .

[34]  Neil Burgess,et al.  Forward and Backward Inference in Spatial Cognition , 2013, PLoS Comput. Biol..

[35]  Stefan Schaal,et al.  STOMP: Stochastic trajectory optimization for motion planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[36]  E. Rolls,et al.  Self-organizing continuous attractor networks and path integration: two-dimensional models of place cells , 2002, Network.

[37]  József Fiser,et al.  Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment , 2011, Science.

[38]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[39]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[40]  P. Dudchenko The hippocampus as a cognitive map , 2010 .

[41]  Brad E. Pfeiffer,et al.  Hippocampal place cell sequences depict future paths to remembered goals , 2013, Nature.

[42]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[43]  Wolfgang Maass,et al.  Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons , 2011, PLoS Comput. Biol..

[44]  Eduardo D. Sontag,et al.  Neural Networks for Control , 1993 .

[45]  Rajesh P. N. Rao,et al.  Neurons as Monte Carlo Samplers: Bayesian Inference and Learning in Spiking Networks , 2014, NIPS.

[46]  Gilles Laurent,et al.  Transient Dynamics for Neural Processing , 2008, Science.

[47]  Marc Toussaint,et al.  On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference (Extended Abstract) , 2013, IJCAI.

[48]  Alex M. Andrew,et al.  Spiking Neuron Models: Single Neurons, Populations, Plasticity , 2003 .

[49]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[50]  Marc Toussaint,et al.  On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.

[51]  Wei Ji Ma,et al.  Bayesian inference with probabilistic population codes , 2006, Nature Neuroscience.

[52]  Gert Cauwenberghs,et al.  Event-driven contrastive divergence for spiking neuromorphic systems , 2013, Front. Neurosci..

[53]  Matthijs A. A. van der Meer,et al.  Internally generated sequences in learning and executing goal-directed behavior , 2014, Trends in Cognitive Sciences.

[54]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[55]  Adam Johnson,et al.  Neural Ensembles in CA3 Transiently Encode Paths Forward of the Animal at a Decision Point , 2007, The Journal of Neuroscience.

[56]  Yonina C. Eldar,et al.  Bayesian Filtering in Spiking Neural Networks: Noise, Adaptation, and Multisensory Integration , 2009, Neural Computation.

[57]  Naftali Tishby,et al.  Cortical activity flips among quasi-stationary states. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[58]  David Kappel,et al.  STDP Installs in Winner-Take-All Circuits an Online Approximation to Hidden Markov Model Learning , 2014, PLoS Comput. Biol..

[59]  Dong Wang,et al.  Complex Learning in Bio-plausible Memristive Networks , 2015, Scientific Reports.

[60]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[61]  Eugene M. Izhikevich,et al.  Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.