Neuromodulatory adaptive combination of correlation-based learning in cerebellum and reward-based learning in basal ganglia for goal-directed behavior control

Goal-directed decision making in biological systems is broadly based on associations between conditional and unconditional stimuli. This can be further classified as classical conditioning (correlation-based learning) and operant conditioning (reward-based learning). A number of computational and experimental studies have well established the role of the basal ganglia in reward-based learning, where as the cerebellum plays an important role in developing specific conditioned responses. Although viewed as distinct learning systems, recent animal experiments point toward their complementary role in behavioral learning, and also show the existence of substantial two-way communication between these two brain structures. Based on this notion of co-operative learning, in this paper we hypothesize that the basal ganglia and cerebellar learning systems work in parallel and interact with each other. We envision that such an interaction is influenced by reward modulated heterosynaptic plasticity (RMHP) rule at the thalamus, guiding the overall goal directed behavior. Using a recurrent neural network actor-critic model of the basal ganglia and a feed-forward correlation-based learning model of the cerebellum, we demonstrate that the RMHP rule can effectively balance the outcomes of the two learning systems. This is tested using simulated environments of increasing complexity with a four-wheeled robot in a foraging task in both static and dynamic configurations. Although modeled with a simplified level of biological abstraction, we clearly demonstrate that such a RMHP induced combinatorial learning mechanism, leads to stabler and faster learning of goal-directed behaviors, in comparison to the individual systems. Thus, in this paper we provide a computational model for adaptive combination of the basal ganglia and cerebellum learning systems by way of neuromodulated plasticity for goal-directed decision making in biological and bio-mimetic organisms.

[1]  G Mann,et al.  ON THE THALAMUS * , 1905, British medical journal.

[2]  A. T. Mathers,et al.  Conditioned Reflexes. An Investigation of the Physiological Activity of the Cerebral Cortex , 1927 .

[3]  R. Rescorla,et al.  Two-process learning theory: Relationships between Pavlovian conditioning and instrumental learning. , 1967, Psychological review.

[4]  D. R. Williams,et al.  Auto-maintenance in the pigeon: sustained pecking despite contingent non-reinforcement. , 1969, Journal of the experimental analysis of behavior.

[5]  D. Purpura,et al.  Synaptic convergence of cerebellar and lenticular projections to thalamus. , 1969, Brain research.

[6]  W. Mehler Idea of a new anatomy of the thalamus. , 1971, Journal of psychiatric research.

[7]  G. Allen,et al.  Cerebrocerebellar communication systems. , 1974, Physiological reviews.

[8]  G. Davey,et al.  Autoshaping in the rat: The effects of localizable visual and auditory signals for food. , 1983, Journal of the experimental analysis of behavior.

[9]  P. Lovibond Facilitation of instrumental behavior by a Pavlovian appetitive conditioned stimulus. , 1983 .

[10]  P. Lovibond Facilitation of instrumental behavior by a Pavlovian appetitive conditioned stimulus. , 1983, Journal of experimental psychology. Animal behavior processes.

[11]  A. Klopf A neuronal model of classical conditioning , 1988 .

[12]  Sommers,et al.  Chaos in random neural networks. , 1988, Physical review letters.

[13]  M. E. Anderson,et al.  Activity of neurons in cerebellar-receiving and pallidal-receiving areas of the thalamus of the behaving monkey. , 1991, Journal of neurophysiology.

[14]  John A. Stankovic,et al.  Real-time computing , 1992 .

[15]  Richard F. Thompson,et al.  Localization of a memory trace in the mammalian brain. , 1993, Science.

[16]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[17]  J. Kaas,et al.  Thalamic connections of the primary motor cortex (M1) of owl monkeys , 1994, The Journal of comparative neurology.

[18]  P. Strick,et al.  Anatomical evidence for cerebellar and basal ganglia involvement in higher cognitive function. , 1994, Science.

[19]  E. Knudsen Supervised learning in the brain , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[20]  D. Salmon,et al.  Neurobiology of skill and habit learning , 1995, Current Opinion in Neurobiology.

[21]  G. Percheron,et al.  The primate motor thalamus , 1996, Brain Research Reviews.

[22]  Steve Rogers,et al.  Adaptive Filter Theory , 1996 .

[23]  G. Percheron The motor thalamus. , 1997, Journal of neurosurgery.

[24]  Richard E Thompson,et al.  Cerebellar circuits and synaptic mechanisms involved in classical eyeblink conditioning , 1997, Trends in Neurosciences.

[25]  Jun Morimoto,et al.  Conference on Intelligent Robots and Systems Reinforcement Le,arning of Dynamic Motor Sequence: Learning to Stand Up , 2022 .

[26]  R. Clark,et al.  Classical conditioning and brain systems: the role of awareness. , 1998, Science.

[27]  S. Shettleworth Cognition, evolution, and behavior , 1998 .

[28]  Carl D. Cheney,et al.  Behavior Analysis and Learning , 1998 .

[29]  Germund Hesslow,et al.  Cerebellum and conditioned reflexes , 1998, Trends in Cognitive Sciences.

[30]  Tatsuya Kimura,et al.  Cerebellar complex spikes encode both destinations and errors in arm movements , 1998, Nature.

[31]  Mitsuo Kawato,et al.  Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[32]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[33]  A. Dickinson,et al.  Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[34]  Iwona Stepniewska,et al.  Pallidal and cerebellar afferents to pre‐supplementary motor area thalamocortical neurons in the owl monkey: A multiple labeling study , 2000, The Journal of comparative neurology.

[35]  E. Kandel,et al.  Is Heterosynaptic modulation essential for stabilizing hebbian plasiticity and memory , 2000, Nature Reviews Neuroscience.

[36]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[37]  B. Brembs,et al.  The operant and the classical in conditioned orientation of Drosophila melanogaster at the flight simulator. , 2000, Learning & memory.

[38]  K. Doya Complementary roles of basal ganglia and cerebellum in learning and motor control , 2000, Current Opinion in Neurobiology.

[39]  D. Joel,et al.  The connections of the dopaminergic system with the striatum in rats and primates: an analysis with respect to the functional and compartmental organization of the striatum , 2000, Neuroscience.

[40]  Peter Redgrave,et al.  A computational model of action selection in the basal ganglia. I. A new functional anatomy , 2001, Biological Cybernetics.

[41]  Paul F. M. J. Verschure,et al.  A real-time model of the cerebellar circuitry underlying classical conditioning: A combined simulation and robotics study , 2001, Neurocomputing.

[42]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[43]  Roland E. Suri,et al.  Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.

[44]  Nikolaus R. McFarland,et al.  Thalamic Relay Nuclei of the Basal Ganglia Form Both Reciprocal and Nonreciprocal Cortical Connections, Linking Multiple Frontal Cortical Areas , 2002, The Journal of Neuroscience.

[45]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[46]  D. A. Baxter,et al.  Operant Reward Learning in Aplysia: Neuronal Correlates and Mechanisms , 2002, Science.

[47]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[48]  P. Verschure,et al.  The cerebellum in action: a simulation and robotics study , 2002, The European journal of neuroscience.

[49]  J. Grafman,et al.  The roles of the cerebellum and basal ganglia in timing and error prediction , 2002, The European journal of neuroscience.

[50]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[51]  Richard F. Thompson,et al.  Neural substrates of eyeblink conditioning: acquisition and retention. , 2003, Learning & memory.

[52]  C. Barnard Animal Behaviour: Mechanism, Development, Function and Evolution , 2003 .

[53]  D. A. Baxter,et al.  Extending in vitro conditioning in Aplysia to analyze operant and classical processes in the same preparation. , 2004, Learning & memory.

[54]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[55]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[56]  J. Wickens,et al.  Computational models of the basal ganglia: from robots to membranes , 2004, Trends in Neurosciences.

[57]  O. Hikosaka,et al.  A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. , 2004, Journal of neurophysiology.

[58]  T. Robbins,et al.  Lesions to the subthalamic nucleus decrease impulsive choice but impair autoshaping in rats: the importance of the basal ganglia in Pavlovian conditioning and impulse control , 2005, The European journal of neuroscience.

[59]  Jochen Triesch,et al.  A Gradient Rule for the Plasticity of a Neuron's Intrinsic Excitability , 2005, ICANN.

[60]  Florentin Wörgötter,et al.  Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms , 2005, Neural Computation.

[61]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[62]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[63]  P. Strick,et al.  The cerebellum communicates with the basal ganglia , 2005, Nature Neuroscience.

[64]  Mark D. Humphries,et al.  A robot model of the basal ganglia: Behavior and intrinsic processing , 2006, Neural Networks.

[65]  H. Yin,et al.  The role of the basal ganglia in habit formation , 2006, Nature Reviews Neuroscience.

[66]  D. A. Baxter,et al.  Feeding behavior of Aplysia: a model system for comparing cellular mechanisms of classical and operant conditioning. , 2006, Learning & memory.

[67]  K. Gurney,et al.  A Physiologically Plausible Model of Action Selection and Oscillatory Activity in the Basal Ganglia , 2006, The Journal of Neuroscience.

[68]  Florentin Wörgötter,et al.  Strongly Improved Stability and Faster Convergence of Temporal Sequence Learning by Using Input Correlations Only , 2006, Neural Computation.

[69]  J C Houk,et al.  Action selection and refinement in subcortical loops through basal ganglia and cerebellum , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[70]  Peter Redgrave,et al.  Basal Ganglia , 2020, Encyclopedia of Autism Spectrum Disorders.

[71]  P. Strick,et al.  Supplementary Motor Area and Presupplementary Motor Area: Targets of Basal Ganglia and Cerebellar Output , 2007, The Journal of Neuroscience.

[72]  Miguel Ángel García-Cabezas,et al.  Distribution of the dopamine innervation in the macaque and human thalamus , 2007, NeuroImage.

[73]  Gordon Pipa,et al.  2007 Special Issue: Fading memory and time series prediction in recurrent networks with different forms of plasticity , 2007 .

[74]  Florentin Wörgötter,et al.  Adaptive, Fast Walking in a Biped Robot under Neuronal Control and Learning , 2007, PLoS Comput. Biol..

[75]  Florentin Wörgötter,et al.  Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison , 2008, Biological Cybernetics.

[76]  Gina G. Turrigiano,et al.  Homeostatic Synaptic Plasticity , 2008 .

[77]  Anatol C. Kreitzer,et al.  Striatal Plasticity and Basal Ganglia Circuit Function , 2008, Neuron.

[78]  Robert A. Legenstein,et al.  A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback , 2008, PLoS Comput. Biol..

[79]  J. Disterhoft,et al.  Where is the trace in trace conditioning? , 2008, Trends in Neurosciences.

[80]  H A Jinnah,et al.  The basal ganglia and cerebellum interact in the expression of dystonic movement. , 2008, Brain : a journal of neurology.

[81]  Richard F. Thompson,et al.  The role of the cerebellum in classical conditioning of discrete behavioral responses , 2009, Neuroscience.

[82]  E. Kuramoto,et al.  Two types of thalamocortical projections from the motor thalamic nuclei of the rat: a single neuron-tracing study using viral vectors. , 2009, Cerebral cortex.

[83]  L. F. Abbott,et al.  Generating Coherent Patterns of Activity from Chaotic Neural Networks , 2009, Neuron.

[84]  M. Volgushev,et al.  Heterosynaptic plasticity in the neocortex , 2009, Experimental Brain Research.

[85]  S. Haber,et al.  The cortico-basal ganglia integrative network: The role of the thalamus , 2009, Brain Research Bulletin.

[86]  I. Pavlov,et al.  Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex , 2010, Annals of Neurosciences.

[87]  L. Abbott,et al.  Stimulus-dependent suppression of chaos in recurrent neural networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[88]  Petia Koprinkova-Hristova,et al.  Adaptive Critic Design with Echo State Network , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[89]  Laure Rondi-Reig,et al.  Role of the Cerebellar Cortex in Conditioned Goal-Directed Behavior , 2010, The Journal of Neuroscience.

[90]  Andreea C. Bostan,et al.  The basal ganglia communicate with the cerebellum , 2010, Proceedings of the National Academy of Sciences.

[91]  H. Bergman,et al.  Goal-directed and habitual control in the basal ganglia: implications for Parkinson's disease , 2010, Nature Reviews Neuroscience.

[92]  Daeyeol Lee,et al.  Role of rodent secondary motor cortex in value-based action selection , 2011, Nature Neuroscience.

[93]  H. Seo,et al.  A reservoir of time constants for memory traces in cortical neurons , 2011, Nature Neuroscience.

[94]  J. Freeman,et al.  Neural circuitry and plasticity mechanisms underlying delay eyeblink conditioning. , 2011, Learning & memory.

[95]  Mitsuo Kawato,et al.  Cerebellar supervised learning revisited: biophysical modeling and degrees-of-freedom control , 2011, Current Opinion in Neurobiology.

[96]  Ana Pekanovic,et al.  Dopaminergic Projections from Midbrain to Primary Motor Cortex Mediate Motor Skill Learning , 2011, The Journal of Neuroscience.

[97]  Karl F. Stock,et al.  A COMPUTATIONAL MODEL , 2011 .

[98]  Roshan Cools,et al.  Habitual versus Goal-directed Action Control in Parkinson Disease , 2011, Journal of Cognitive Neuroscience.

[99]  Anne E Carpenter,et al.  Neuron-type specific signals for reward and punishment in the ventral tegmental area , 2011, Nature.

[100]  Y. Goda,et al.  Homeostatic synaptic plasticity: from single synapses to neural circuits , 2012, Current Opinion in Neurobiology.

[101]  E. Miller,et al.  The Role of Prefrontal Dopamine D1 Receptors in the Neural Mechanisms of Associative Learning , 2012, Neuron.

[102]  Jun Morimoto,et al.  Combining Correlation-Based and Reward-Based Learning in Neural Control for Policy Improvement , 2013, Adv. Complex Syst..

[103]  Clémentine Bosch-Bouju,et al.  Motor thalamus integration of cortical, cerebellar and basal ganglia information: implications for normal and parkinsonian conditions , 2013, Front. Comput. Neurosci..

[104]  Martin A. Riedmiller,et al.  Modeling effects of intrinsic and extrinsic rewards on the competition between striatal learning systems , 2013, Front. Psychol..

[105]  Paul F. M. J. Verschure,et al.  Nucleo-olivary inhibition balances the interaction between the reactive and adaptive layers in motor control , 2013, Neural Networks.

[106]  Wulfram Gerstner,et al.  Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons , 2013, PLoS Comput. Biol..

[107]  A. Grace,et al.  Dopamine Triggers Heterosynaptic Plasticity , 2013, The Journal of Neuroscience.

[108]  Peter Ford Dominey,et al.  Real-Time Parallel Processing of Grammatical Structure in the Fronto-Striatal System: A Recurrent Network Simulation Study Using Reservoir Computing , 2013, PloS one.

[109]  Florentin Wörgötter,et al.  Information dynamics based self-adaptive reservoir for delay temporal memory tasks , 2013, Evol. Syst..

[110]  Jochen J. Steil,et al.  Rare Neural Correlations Implement Robotic Conditioning with Delayed Rewards and Disturbances , 2013, Front. Neurorobot..

[111]  Jun Morimoto,et al.  Neural Combinatorial Learning of Goal-Directed Behavior with Reservoir Critic and Reward Modulated Hebbian Plasticity , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[112]  M. Volgushev,et al.  Heterosynaptic Plasticity , 2014, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[113]  Daniela Popa,et al.  Cerebellum involvement in cortical sensorimotor circuits for the control of voluntary movements , 2014, Nature Neuroscience.

[114]  T. Robinson,et al.  The Form of a Conditioned Stimulus Can Influence the Degree to Which It Acquires Incentive Motivational Properties , 2014, PloS one.

[115]  Robert Chen,et al.  Heterosynaptic Modulation of Motor Cortical Plasticity in Human , 2014, The Journal of Neuroscience.

[116]  Florentin Wörgötter,et al.  Reservoir of neurons with adaptive time constants: a hybrid model for robust motor-sensory temporal processing , 2014, BMC Neuroscience.

[117]  Carmen Varela,et al.  Thalamic neuromodulation and its implications for executive networks , 2014, Front. Neural Circuits.

[118]  Wolfgang Maass,et al.  Emergence of complex computational structures from chaotic neural networks through reward-modulated Hebbian learning. , 2014, Cerebral cortex.

[119]  W. Skaggs,et al.  The Cerebellum , 2016 .

[120]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .