Biologically plausible gated recurrent neural networks for working memory and learning-to-learn

The acquisition of knowledge does not occur in isolation; rather, learning experiences in the same or similar domains amalgamate. This process through which learning can accelerate over time is referred to as learning-to-learn or meta-learning. While meta-learning can be implemented in recurrent neural networks, these networks tend to be trained with architectures that are not easily interpretable or mappable to the brain and with learning rules that are biologically implausible. Specifically, these rules employ backpropagation-through-time for learning, which relies on information that is unavailable at synapses that are undergoing plasticity in the brain. While memory models that exclusively use local information for their weight updates have been developed, they have limited capacity to integrate information over long timespans and therefore cannot easily learn-to-learn. Here, we propose a novel gated recurrent network named RECOLLECT, which can flexibly retain or forget information by means of a single memory gate and biologically plausible trial-and-error-learning that requires only local information. We demonstrate that RECOLLECT successfully learns to represent task-relevant information over increasingly long memory delays in a pro-/anti-saccade task, and that it learns to flush its memory at the end of a trial. Moreover, we show that RECOLLECT can learn-to-learn an effective policy on a reversal bandit task. Finally, we show that the solutions acquired by RECOLLECT resemble how animals learn similar tasks.

[1]  Laura M. Haetzel,et al.  Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning , 2022, Cell reports.

[2]  R. Sutton A History of Meta-gradient: Gradient Methods for Meta-learning , 2022, ArXiv.

[3]  Sho Yagishita,et al.  A behavioural correlate of the synaptic eligibility trace in the nucleus accumbens , 2022, Scientific Reports.

[4]  I. Fiete,et al.  Attractor and integrator networks in the brain , 2021, Nature Reviews Neuroscience.

[5]  Stephen G Lisberger,et al.  Diversity and dynamism in the cerebellum , 2020, Nature Neuroscience.

[6]  Jane X. Wang,et al.  Meta-learning in natural and artificial intelligence , 2020, Current Opinion in Behavioral Sciences.

[7]  James A. Brissenden,et al.  Stimulus-Specific Visual Working Memory Representations in Human Cerebellar Lobule VIIb/VIIIa , 2020, The Journal of Neuroscience.

[8]  Aske Plaat,et al.  A survey of deep meta-learning , 2020, Artificial Intelligence Review.

[9]  Zengcai V. Guo,et al.  A cortico-basal ganglia-thalamo-cortical channel underlying short-term memory , 2020, Neuron.

[10]  C. Pennartz,et al.  Learning, memory and consolidation mechanisms for behavioral control in hierarchically organized cortico‐basal ganglia systems , 2019, Hippocampus.

[11]  Wolfgang Maass,et al.  A solution to the learning dilemma for recurrent networks of spiking neurons , 2019, Nature Communications.

[12]  Timothy P Lillicrap,et al.  Backpropagation through time and the brain , 2019, Current Opinion in Neurobiology.

[13]  Georg B. Keller,et al.  Predictive Processing: A Canonical Cortical Computation , 2018, Neuron.

[14]  Michael N. Economo,et al.  A cortico-cerebellar loop for motor planning , 2018, Nature.

[15]  Jonathan W Pillow,et al.  Error-correcting dynamics in visual working memory , 2018, Nature Communications.

[16]  Joel Z. Leibo,et al.  Prefrontal cortex as a meta-reinforcement learning system , 2018, bioRxiv.

[17]  Yoshua Bengio,et al.  Light Gated Recurrent Units for Speech Recognition , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.

[18]  Leena E Williams,et al.  Higher-Order Thalamocortical Inputs Gate Synaptic Long-Term Potentiation via Disinhibition , 2018, Neuron.

[19]  Wulfram Gerstner,et al.  Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of NeoHebbian Three-Factor Learning Rules , 2018, Front. Neural Circuits.

[20]  J. Gordon,et al.  Thalamic projections sustain prefrontal activity during working memory maintenance , 2017, Nature Neuroscience.

[21]  Ralf D. Wimmer,et al.  Thalamic amplification of cortical connectivity sustains attentional control , 2017, Nature.

[22]  P. Rudebeck,et al.  The neural basis of reversal learning: An updated perspective , 2017, Neuroscience.

[23]  Fathi M. Salem,et al.  Gate-variants of Gated Recurrent Unit (GRU) neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[24]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[25]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[26]  Wilten Nicola,et al.  Supervised learning in spiking neural networks with FORCE training , 2016, Nature Communications.

[27]  Wolfram Schultz,et al.  Dopamine reward prediction-error signalling: a two-component response , 2016, Nature Reviews Neuroscience.

[28]  Pieter R. Roelfsema,et al.  How Attention Can Create Synaptic Tags for the Learning of Working Memories in Sequential Tasks , 2015, PLoS Comput. Biol..

[29]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[30]  Mark H. M. Winands,et al.  Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search , 2014, CGW@ECAI.

[31]  Y. Dan,et al.  Long-range and local circuits for top-down modulation of visual cortex processing , 2014, Science.

[32]  Pieter R. Roelfsema,et al.  Learning resets of neural working memory , 2014, ESANN.

[33]  C. Padoa-Schioppa,et al.  Contributions of Orbitofrontal and Lateral Prefrontal Cortices to Economic Choice and the Good-to-Action Transformation , 2014, Neuron.

[34]  Richard S. Sutton,et al.  True Online TD(lambda) , 2014, ICML.

[35]  Christopher K. Kovach,et al.  Anterior Prefrontal Cortex Contributes to Action Selection through Tracking of Recent Reward Trends , 2012, The Journal of Neuroscience.

[36]  Timothy E. J. Behrens,et al.  Review Frontal Cortex and Reward-guided Learning and Decision-making Figure 1. Frontal Brain Regions in the Macaque Involved in Reward-guided Learning and Decision-making Finer Grained Anatomical Divisions with Frontal Cortical Systems for Reward-guided Behavior , 2022 .

[37]  K. Doya,et al.  Validation of Decision-Making Models and Analysis of Decision Variables in the Rat Basal Ganglia , 2009, The Journal of Neuroscience.

[38]  N. Gordon The cerebellum and cognition. , 2007, European journal of paediatric neurology : EJPN : official journal of the European Paediatric Neurology Society.

[39]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[40]  C. Padoa-Schioppa,et al.  Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.

[41]  Pieter R. Roelfsema,et al.  Attention-Gated Reinforcement Learning of Internal Representations for Classification , 2005, Neural Computation.

[42]  Mingsha Zhang,et al.  Persistent LIP activity in memory antisaccades: working memory for a sensorimotor transformation. , 2004, Journal of neurophysiology.

[43]  A. Graybiel,et al.  Representation of Action Sequence Boundaries by Macaque Prefrontal Cortical Neurons , 2003, Science.

[44]  M. Goldberg,et al.  Activity of neurons in the lateral intraparietal area of the monkey during an antisaccade task , 1999, Nature Neuroscience.

[45]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[46]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[47]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[48]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[49]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[50]  R. Andersen,et al.  Memory related motor planning activity in posterior parietal cortex of macaque , 1988, Experimental Brain Research.

[51]  E. Brunswik Probability as a determiner of rat behavior. , 1939 .

[52]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[53]  H. Harlow,et al.  The formation of learning sets. , 1949, Psychological review.