Biologically plausible learning in recurrent neural networks reproduces neural dynamics observed during cognitive tasks

Neural activity during cognitive tasks exhibits complex dynamics that flexibly encode task-relevant variables. Chaotic recurrent networks, which spontaneously generate rich dynamics, have been proposed as a model of cortical computation during cognitive tasks. However, existing methods for training these networks are either biologically implausible, and/or require a continuous, real-time error signal to guide learning. Here we show that a biologically plausible learning rule can train such recurrent networks, guided solely by delayed, phasic rewards at the end of each trial. Networks endowed with this learning rule can successfully learn nontrivial tasks requiring flexible (context-dependent) associations, memory maintenance, nonlinear mixed selectivities, and coordination among multiple outputs. The resulting networks replicate complex dynamics previously observed in animal cortex, such as dynamic encoding of task features and selective integration of sensory inputs. We conclude that recurrent neural networks offer a plausible model of cortical dynamics during both learning and performance of flexible behavior.

[1]  R. Romo,et al.  Neuronal Population Coding of Parametric Working Memory , 2010, The Journal of Neuroscience.

[2]  Wolfgang Maass,et al.  A Reward-Modulated Hebbian Learning Rule Can Explain Experimentally Observed Network Reorganization in a Brain Control Task , 2010, The Journal of Neuroscience.

[3]  W. Gerstner,et al.  Optimal Control of Transient Dynamics in Balanced Networks Supports Generation of Complex Movements , 2014, Neuron.

[4]  Razvan V. Florian,et al.  Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.

[5]  Jochen J. Steil,et al.  Solving the Distal Reward Problem with Rare Correlations , 2013, Neural Computation.

[6]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[7]  David J. Freedman,et al.  in Inferior Temporal and Prefrontal Cortex Dynamic Population Coding of Category Information , 2015 .

[8]  György Buzsáki,et al.  Micro-, Meso- and Macro-Dynamics of the Brain , 2016, Research and Perspectives in Neurosciences.

[9]  Matthew T. Kaufman,et al.  A neural network that finds a naturalistic solution for the production of muscle activity , 2015, Nature Neuroscience.

[10]  Thomas Miconi,et al.  Training recurrent neural networks with sparse, delayed rewards for flexible decision tasks , 2015, 1507.08973.

[11]  Xiao-Jing Wang,et al.  The importance of mixed selectivity in complex cognitive tasks , 2013, Nature.

[12]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[13]  M. Stokes ‘Activity-silent’ working memory in prefrontal cortex: a dynamic coding framework , 2015, Trends in Cognitive Sciences.

[14]  Stanislas Dehaene,et al.  Decoding the Dynamics of Conscious Perception: The Temporal Generalization Method , 2016 .

[15]  Paul Miller,et al.  Heterogenous Population Coding of a Short-Term Memory and Decision Task , 2010, The Journal of Neuroscience.

[16]  W. Newsome,et al.  Context-dependent computation by recurrent dynamics in prefrontal cortex , 2013, Nature.

[17]  L. F. Abbott,et al.  Generating Coherent Patterns of Activity from Chaotic Neural Networks , 2009, Neuron.

[18]  Anca Velisar,et al.  Benchmarking of dynamic simulation predictions in two software platforms using an upper limb musculoskeletal model , 2015, Computer methods in biomechanics and biomedical engineering.

[19]  Matthew T. Kaufman,et al.  Neural population dynamics during reaching , 2012, Nature.

[20]  N. Sigala,et al.  Dynamic Coding for Cognitive Control in Prefrontal Cortex , 2013, Neuron.

[21]  L. Abbott,et al.  From fixed points to chaos: Three models of delayed discrimination , 2013, Progress in Neurobiology.

[22]  H. Seung,et al.  Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances. , 2007, Journal of neurophysiology.

[23]  Wulfram Gerstner,et al.  Code-specific policy gradient rules for spiking neurons , 2009, NIPS.

[24]  D. Thelen Adjustment of muscle mechanics model parameters to simulate dynamic contractions in older adults. , 2003, Journal of biomechanical engineering.

[25]  Scott L. Delp,et al.  A Model of the Upper Extremity for Simulating Musculoskeletal Surgery and Analyzing Neuromuscular Control , 2005, Annals of Biomedical Engineering.

[26]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[27]  Wolfgang Maass,et al.  Emergence of complex computational structures from chaotic neural networks through reward-modulated Hebbian learning. , 2014, Cerebral cortex.

[28]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[29]  Sommers,et al.  Chaos in random neural networks. , 1988, Physical review letters.

[30]  Herbert Jaeger,et al.  The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[31]  W. Maass,et al.  State-dependent computations: spatiotemporal processing in cortical networks , 2009, Nature Reviews Neuroscience.

[32]  Guangyu R. Yang,et al.  Training Excitatory-Inhibitory Recurrent Neural Networks for Cognitive Tasks: A Simple and Flexible Framework , 2016, PLoS Comput. Biol..

[33]  Dean V. Buonomano,et al.  ROBUST TIMING AND MOTOR PATTERNS BY TAMING CHAOS IN RECURRENT NEURAL NETWORKS , 2012, Nature Neuroscience.

[34]  Matthew T. Kaufman,et al.  A category-free neural population supports evolving demands during decision-making , 2014, Nature Neuroscience.

[35]  Christopher D. Harvey,et al.  Recurrent Network Models of Sequence Generation and Memory , 2016, Neuron.

[36]  Francesca Mastrogiuseppe,et al.  Intrinsically-generated fluctuating activity in excitatory-inhibitory networks , 2016, PLoS Comput. Biol..

[37]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[38]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[39]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[40]  Ila R Fiete,et al.  Gradient learning in spiking neural networks by dynamic perturbation of conductances. , 2006, Physical review letters.

[41]  Henning Sprekeler,et al.  Functional Requirements for Reward-Modulated Spike-Timing-Dependent Plasticity , 2010, The Journal of Neuroscience.