Reversal Learning in Humans and Gerbils: Dynamic Control Network Facilitates Learning

Biologically plausible modeling of behavioral reinforcement learning tasks has seen great improvements over the past decades. Less work has been dedicated to tasks involving contingency reversals, i.e., tasks in which the original behavioral goal is reversed one or multiple times. The ability to adjust to such reversals is a key element of behavioral flexibility. Here, we investigate the neural mechanisms underlying contingency-reversal tasks. We first conduct experiments with humans and gerbils to demonstrate memory effects, including multiple reversals in which subjects (humans and animals) show a faster learning rate when a previously learned contingency re-appears. Motivated by recurrent mechanisms of learning and memory for object categories, we propose a network architecture which involves reinforcement learning to steer an orienting system that monitors the success in reward acquisition. We suggest that a model sensory system provides feature representations which are further processed by category-related subnetworks which constitute a neural analog of expert networks. Categories are selected dynamically in a competitive field and predict the expected reward. Learning occurs in sequentialized phases to selectively focus the weight adaptation to synapses in the hierarchical network and modulate their weight changes by a global modulator signal. The orienting subsystem itself learns to bias the competition in the presence of continuous monotonic reward accumulation. In case of sudden changes in the discrepancy of predicted and acquired reward the activated motor category can be switched. We suggest that this subsystem is composed of a hierarchically organized network of dis-inhibitory mechanisms, dubbed a dynamic control network (DCN), which resembles components of the basal ganglia. The DCN selectively activates an expert network, corresponding to the current behavioral strategy. The trace of the accumulated reward is monitored such that large sudden deviations from the monotonicity of its evolution trigger a reset after which another expert subnetwork can be activated—if it has already been established before—or new categories can be recruited and associated with novel behavioral patterns.

[1]  H. Harlow,et al.  Learning motivated by a manipulation drive. , 1950, Journal of experimental psychology.

[2]  O. Mowrer Two-factor learning theory reconsidered, with special reference to secondary reinforcement and the concept of habit. , 1956, Psychological review.

[3]  B. H. Pubols Successive discrimination reversal learning in the white rat: a comparison of two procedures. , 1957, Journal of Comparative and Physiological Psychology.

[4]  K. Clayton The relative effects of forced reward and forced nonreward during widely spaced successive discrimination reversal. , 1962, Journal of comparative and physiological psychology.

[5]  R L Gossette,et al.  Comparison of spatial successive discrimination reversal performances of two groups of new world monkeys. , 1966, Perceptual and motor skills.

[6]  Jerome M. Feldman Successive discrimination reversal performance as a function of level of drive and incentive , 1968 .

[7]  Successive Discrimination Reversal Measures as a Function of Variation of Motivational and Incentive Levels , 1968, Perceptual and motor skills.

[8]  Leonard Uhr,et al.  Layered "Recognition Cone" Networks That Preprocess, Classify, and Describe , 1972, IEEE Transactions on Computers.

[9]  S. Grossberg,et al.  How does a brain build a cognitive code? , 1980, Psychological review.

[10]  Stephen Grossberg,et al.  Competitive Learning: From Interactive Activation to Adaptive Resonance , 1987, Cogn. Sci..

[11]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[12]  L. B. Lmeida Backpropagation in perceptrons with feedback , 1988 .

[13]  T. Robbins,et al.  The effects of excitotoxic lesions of the basal forebrain on the acquisition, retention and serial reversal of visual discriminations in marmosets , 1990, Neuroscience.

[14]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[15]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[16]  S. Grossberg,et al.  Normal and amnesic learning, recognition and memory by a neural model of cortico-hippocampal interactions , 1993, Trends in Neurosciences.

[17]  A. Graybiel Building action repertoires: memory and learning functions of the basal ganglia , 1995, Current Opinion in Neurobiology.

[18]  T. Sejnowski,et al.  A selection model for motion processing in area MT of primates , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[19]  C. Pennartz The ascending neuromodulatory systems in learning by reinforcement: comparing computational conjectures with experimental findings , 1995, Brain Research Reviews.

[20]  K. D. Zylan,et al.  Article , 1996, Physiology & Behavior.

[21]  A. Burkhalter,et al.  Different Balance of Excitation and Inhibition in Forward and Feedback Circuits of Rat Visual Cortex , 1996, The Journal of Neuroscience.

[22]  Effect of multiple discrimination reversals on acquisition of a drug discrimination task in rats. , 1996, Behavioural pharmacology.

[23]  J. Mink THE BASAL GANGLIA: FOCUSED SELECTION AND INHIBITION OF COMPETING MOTOR PROGRAMS , 1996, Progress in Neurobiology.

[24]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[25]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[26]  C. Pennartz Reinforcement learning by Hebbian synapses with adaptive thresholds , 1997, Neuroscience.

[27]  A. Graybiel The Basal Ganglia and Chunking of Action Repertoires , 1998, Neurobiology of Learning and Memory.

[28]  Paolo Gaudiano,et al.  Application of Biological Learning Theories to Mobile Robot Avoidance and Approach Behaviors , 1998, Adv. Complex Syst..

[29]  R. Guillery,et al.  On the actions that one nerve cell can have on another: distinguishing "drivers" from "modulators". , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[30]  H Scheich,et al.  Bilateral ablation of auditory cortex in Mongolian gerbil affects discrimination of frequency modulated tones but not of pure tones. , 1999, Learning & memory.

[31]  P. Redgrave,et al.  The basal ganglia: a vertebrate solution to the selection problem? , 1999, Neuroscience.

[32]  J. Staddon,et al.  The dynamics of operant conditioning. , 1999, Psychological review.

[33]  Peter Redgrave,et al.  A computational model of action selection in the basal ganglia. II. Analysis and simulation of behaviour , 2001, Biological Cybernetics.

[34]  Peter Redgrave,et al.  A computational model of action selection in the basal ganglia. I. A new functional anatomy , 2001, Biological Cybernetics.

[35]  W. Freeman,et al.  Change in pattern of ongoing cortical activity with auditory category learning , 2001, Nature.

[36]  W. Schultz Book Review: Reward Signaling by Dopamine Neurons , 2001, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[37]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[38]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[39]  J.A. Anderson,et al.  Neural Network Models for Pattern Recognition and Associative Memory , 2002 .

[40]  M. Bouton Context, ambiguity, and unlearning: sources of relapse after behavioral extinction , 2002, Biological Psychiatry.

[41]  A. Lima-de-faria Change of pattern , 2003 .

[42]  M. Farah,et al.  Ventromedial frontal cortex mediates affective shifting in humans: evidence from a reversal learning paradigm. , 2003, Brain : a journal of neurology.

[43]  Malcolm W. Brown,et al.  Cholinergic Neurotransmission Is Essential for Perirhinal Cortical Plasticity and Recognition Memory , 2003, Neuron.

[44]  Jun Tani,et al.  Self-organization of behavioral primitives as multiple attractor dynamics: A robot experiment , 2003, IEEE Trans. Syst. Man Cybern. Part A.

[45]  T. Robbins,et al.  The neuropsychology of ventral prefrontal cortex: Decision-making and reversal learning , 2004, Brain and Cognition.

[46]  Jonathan D. Cohen,et al.  Computational roles for dopamine in behavioural control , 2004, Nature.

[47]  W. Senn,et al.  Top-down dendritic input increases the gain of layer 5 pyramidal neurons. , 2004, Cerebral cortex.

[48]  Jun Tani,et al.  Self-organization of distributedly represented multiple behavior schemata in a mirror system: reviews of robot experiments using RNNPB , 2004, Neural Networks.

[49]  B. Kulig,et al.  Enhancement of successive discrimination reversal learning by methamphetamine , 2004, Psychopharmacologia.

[50]  E. Rolls,et al.  Reward-related Reversal Learning after Surgical Excisions in Orbito-frontal or Dorsolateral Prefrontal Cortex in Humans , 2004, Journal of Cognitive Neuroscience.

[51]  A. Graybiel,et al.  Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories , 2005, Nature.

[52]  Pieter R. Roelfsema,et al.  Attention-Gated Reinforcement Learning of Internal Representations for Classification , 2005, Neural Computation.

[53]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[54]  M. Carandini,et al.  The Suppressive Field of Neurons in Lateral Geniculate Nucleus , 2005, The Journal of Neuroscience.

[55]  Peter Redgrave,et al.  Basal Ganglia , 2020, Encyclopedia of Autism Spectrum Disorders.

[56]  M. Wendl This is an Open Access article distribut... , 2007 .

[57]  W. Schultz Multiple dopamine functions at different time courses. , 2007, Annual review of neuroscience.

[58]  T. Robbins,et al.  Effects of orbitofrontal, infralimbic and prelimbic cortical lesions on serial spatial reversal learning in the rat , 2007, Behavioural Brain Research.

[59]  T. Robbins,et al.  Differential regulation of fronto-executive function by the monoamines and acetylcholine. , 2007, Cerebral cortex.

[60]  Effects of rat medial prefrontal cortex lesions on olfactory serial reversal and delayed alternation tasks , 2008, Neuroscience Research.

[61]  Keiji Tanaka,et al.  Conflict-induced behavioural adjustment: a clue to the executive functions of the prefrontal cortex , 2009, Nature Reviews Neuroscience.

[62]  Wolfgang M. Pauli,et al.  Computational models of cognitive control , 2010, Current Opinion in Neurobiology.

[63]  T. Robbins,et al.  Differential Contributions of the Primate Ventrolateral Prefrontal and Orbitofrontal Cortex to Serial Reversal Learning , 2010, The Journal of Neuroscience.

[64]  Walter Senn,et al.  Spatio-Temporal Credit Assignment in Neuronal Population Learning , 2011, PLoS Comput. Biol..

[65]  Karl F. Stock,et al.  A COMPUTATIONAL MODEL , 2011 .

[66]  Boris S. Gutkin,et al.  A Reinforcement Learning Theory for Homeostatic Regulation , 2011, NIPS.

[67]  Robert C. Wilson,et al.  Inferring Relevance in a Changing World , 2012, Front. Hum. Neurosci..

[68]  Tobias Brosch,et al.  The Brain's Sequential Parallelism: Perceptual Decision-Making and Early Sensory Responses , 2012, ICONIP.

[69]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[70]  Panos E. Trahanias,et al.  Self-organizing high-order cognitive functions in artificial agents: Implications for possible prefrontal cortex mechanisms , 2012, Neural Networks.

[71]  M. Frank,et al.  Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. , 2012, Cerebral cortex.

[72]  Pieter R. Roelfsema,et al.  Neurally Plausible Reinforcement Learning of Working Memory Tasks , 2012, NIPS.

[73]  André Brechmann,et al.  MOTI: A Motivational Prosody Corpus for Speech-Based Tutorial Systems , 2012, ITG Conference on Speech Communication.

[74]  Zhong-Lin Lu,et al.  Common Neural Mechanisms Underlying Reversal Learning by Reward and Punishment , 2013, PloS one.

[75]  A. Brechmann,et al.  Feedback that confirms reward expectation triggers auditory cortex activity. , 2013, Journal of neurophysiology.

[76]  Brice Bathellier,et al.  A multiplicative reinforcement learning model capturing learning dynamics and interindividual variability in mice , 2013, Proceedings of the National Academy of Sciences.

[77]  J. Bergman,et al.  Repeated acquisition and discrimination reversal in the squirrel monkey (Saimiri sciureus) , 2013, Animal Cognition.

[78]  Anne G E Collins,et al.  Cognitive control over learning: creating, clustering, and generalizing task-set structure. , 2013, Psychological review.

[79]  C. Blaha,et al.  Evidence that conditioned avoidance responses are reinforced by positive prediction errors signaled by tonic striatal dopamine , 2013, Behavioural Brain Research.

[80]  A. Brechmann,et al.  Learning‐dependent plasticity in human auditory cortex during appetitive operant conditioning , 2013, Human Brain Mapping.

[81]  Michael W. Spratling A single functional model of drivers and modulators in cortex , 2013, Journal of Computational Neuroscience.

[82]  Etienne Koechlin,et al.  Foundations of human reasoning in the prefrontal cortex , 2014, Science.

[83]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  Mark E. Bouton,et al.  A fundamental role for context in instrumental learning and extinction , 2014, Behavioural Processes.

[85]  Pieter R. Roelfsema,et al.  Reinforcement Learning of Linking and Tracing Contours in Recurrent Neural Networks , 2015, PLoS Comput. Biol..

[86]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Robert C. Wilson,et al.  Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms , 2015, The Journal of Neuroscience.

[88]  André Brechmann,et al.  Carrot and stick 2.0: The benefits of natural and motivational prosody in computer-assisted learning , 2015, Comput. Hum. Behav..

[89]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[90]  L. B. Almeida,et al.  BACKPROPAGATION IN PERCEPTRONS WITH FEEDBACK , 2022 .