Emergent structured transition from variation to repetition in a biologically-plausible model of learning in basal ganglia

Often, when animals encounter an unexpected sensory event, they transition from executing a variety of movements to repeating the movement(s) that may have caused the event. According to a recent theory of action discovery (Redgrave and Gurney, 2006), repetition allows the animal to represent those movements, and the outcome, as an action for later recruitment. The transition from variation to repetition often follows a non-random, structured, pattern. While the structure of the pattern can be explained by sophisticated cognitive mechanisms, simpler mechanisms based on dopaminergic modulation of basal ganglia (BG) activity are thought to underlie action discovery (Redgrave and Gurney, 2006). In this paper we ask the question: can simple BG-mediated mechanisms account for a structured transition from variation to repetition, or are more sophisticated cognitive mechanisms always necessary? To address this question, we present a computational model of BG-mediated biasing of behavior. In our model, unlike most other models of BG function, the BG biases behavior through modulation of cortical response to excitation; many possible movements are represented by the cortical area; and excitation to the cortical area is topographically-organized. We subject the model to simple reaching tasks, inspired by behavioral studies, in which a location to which to reach must be selected. Locations within a target area elicit a reinforcement signal. A structured transition from variation to repetition emerges from simple BG-mediated biasing of cortical response to excitation. We show how the structured pattern influences behavior in simple and complicated tasks. We also present analyses that describe the structured transition from variation to repetition due to BG-mediated biasing and from biasing that would be expected from a type of cognitive biasing, allowing us to compare behavior resulting from these types of biasing and make connections with future behavioral experiments.

[1]  J. Wickens,et al.  Computational models of the basal ganglia: from robots to membranes , 2004, Trends in Neurosciences.

[2]  三嶋 博之 The theory of affordances , 2008 .

[3]  D. Fum,et al.  Dissociable processes underlying decisions in the Iowa Gambling Task: a new integrative framework , 2009, Behavioral and Brain Functions.

[4]  J. Tepper,et al.  Inhibitory control of neostriatal projection neurons by GABAergic interneurons , 1999, Nature Neuroscience.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Jonathan D. Cohen,et al.  The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. , 2006, Psychological review.

[7]  R. Shaw,et al.  Perceiving, Acting and Knowing : Toward an Ecological Psychology , 1978 .

[8]  M D Humphries,et al.  The role of intra-thalamic and thalamocortical circuits in action selection , 2002, Network.

[9]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[10]  J. Movshon,et al.  The analysis of visual motion: a comparison of neuronal and psychophysical performance , 1992, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[11]  R. Ammons,et al.  Acquisition of motor skill: III. Effects of initially distributed practice on rotary pursuit performance. , 1950, Journal of experimental psychology.

[12]  Richard S. Sutton,et al.  Landmark learning: An illustration of associative search , 1981, Biological Cybernetics.

[13]  Stephen Marsland,et al.  Using habituation in machine learning , 2009, Neurobiology of Learning and Memory.

[14]  Kevin Gurney,et al.  Optimal decision-making in mammals: insights from a robot study of rodent texture discrimination , 2012, Journal of The Royal Society Interface.

[15]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[16]  M. Shadlen,et al.  Neural Activity in Macaque Parietal Cortex Reflects Temporal Integration of Visual Motion Signals during Perceptual Decision Making , 2005, The Journal of Neuroscience.

[17]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[18]  Andrew G. Barto,et al.  An intrinsic reward mechanism for efficient exploration , 2006, ICML.

[19]  Daniel M. Wolpert,et al.  Making smooth moves , 2022 .

[20]  Christos Dimitrakakis,et al.  Nearly Optimal Exploration-Exploitation Decision Thresholds , 2006, ICANN.

[21]  A G Barto,et al.  Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[22]  A. Barto,et al.  Effect on movement selection of an evolving sensory representation: A multiple controller model of skill acquisition , 2009, Brain Research.

[23]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[24]  Mark D. Humphries,et al.  A robot model of the basal ganglia: Behavior and intrinsic processing , 2006, Neural Networks.

[25]  P. Redgrave,et al.  The path to learning: Action acquisition is impaired when visual reinforcement signals must first access cortex , 2013, Behavioural Brain Research.

[26]  Richard S. Sutton,et al.  Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.

[27]  J. Mink THE BASAL GANGLIA: FOCUSED SELECTION AND INHIBITION OF COMPETING MOTOR PROGRAMS , 1996, Progress in Neurobiology.

[28]  P. Redgrave,et al.  What is reinforced by phasic dopamine signals? , 2008, Brain Research Reviews.

[29]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[30]  P. Redgrave,et al.  Functional properties of the basal ganglia's re-entrant loop architecture: selection and reinforcement , 2011, Neuroscience.

[31]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[32]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[33]  Morten H. Christiansen,et al.  A computational model , 2014 .

[34]  Francesco Mannella,et al.  Intrinsically motivated action-outcome learning and goal-based action recall: a system-level bio-constrained computational model. , 2013, Neural networks : the official journal of the International Neural Network Society.

[35]  Marco Mirolli,et al.  Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study , 2013, Neural Networks.

[36]  RuppinEytan,et al.  Actor-critic models of the basal ganglia , 2002 .

[37]  Kevin Gurney,et al.  No learning where to go without first knowing where you're coming from: action discovery is trajectory, not endpoint based , 2013, Front. Psychol..

[38]  J. Gold,et al.  The neural basis of decision making. , 2007, Annual review of neuroscience.

[39]  R. Andersen,et al.  Intentional maps in posterior parietal cortex. , 2002, Annual review of neuroscience.

[40]  A P Georgopoulos,et al.  On the relations between the direction of two-dimensional arm movements and cell discharge in primate motor cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[41]  Kevin Gurney,et al.  Action Discovery and Intrinsic Motivation: A Biologically Constrained Formalisation , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[42]  Vijaykumar Gullapalli,et al.  A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[43]  Kevin Gurney,et al.  A Novel Behavioural Task for Researching Intrinsic Motivations , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[44]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[45]  E. Thorndike Animal Intelligence; Experimental Studies , 2009 .

[46]  Michael L. Platt,et al.  Neural correlates of decision variables in parietal cortex , 1999, Nature.

[47]  Gianluca Baldassarre,et al.  What are intrinsic motivations? A biological perspective , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[48]  Wim Vanduffel,et al.  Should I Stay or Should I Go? , 2016, Neuron.

[49]  J. Deniau,et al.  Disinhibition as a basic process in the expression of striatal functions , 1990, Trends in Neurosciences.

[50]  Michael T. Rosenstein,et al.  Supervised Actor‐Critic Reinforcement Learning , 2012 .

[51]  T D Lee,et al.  Distribution of practice in motor skill acquisition: different effects for discrete and continuous tasks. , 1989, Research quarterly for exercise and sport.

[52]  Pierre-Yves Oudeyer,et al.  Information-seeking, curiosity, and attention: computational and neural mechanisms , 2013, Trends in Cognitive Sciences.

[53]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[54]  J. Wickens Synaptic plasticity in the basal ganglia , 2009, Behavioural Brain Research.

[55]  Alan D. Baddeley,et al.  The influence of length and frequency of training session on the rate of learning to type. , 1978 .

[56]  Jonathan M. Chambers,et al.  Modelling Natural Action Selection: Mechanisms of choice in the primate brain: a quick look at positive feedback , 2011 .

[57]  Kevin Gurney,et al.  A Novel Task for the Investigation of Action Acquisition , 2012, PloS one.

[58]  H. Seung,et al.  Group report: Microcircuits, molecules, and motivated behavior--Microcircuits in the striatum , 2006 .

[59]  Peter Redgrave,et al.  A computational model of action selection in the basal ganglia. I. A new functional anatomy , 2001, Biological Cybernetics.

[60]  B. Roche,et al.  The Behavior of Organisms? , 1997 .

[61]  W. Brown Animal Intelligence: Experimental Studies , 1912, Nature.

[62]  Mitsuo Kawato,et al.  Feedback-Error-Learning Neural Network for Supervised Motor Learning , 1990 .

[63]  P. Anandan,et al.  Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[64]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[65]  Paolo Calabresi,et al.  Dopamine-mediated regulation of corticostriatal synaptic plasticity , 2007, Trends in Neurosciences.

[66]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[67]  Kevin Gurney,et al.  Dopamine-mediated action discovery promotes optimal behavior ‘for free’ , 2011, BMC Neuroscience.

[68]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[69]  Kevin Gurney,et al.  The Role of the Basal Ganglia in Discovering Novel Actions , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[70]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[71]  Karl F. Stock,et al.  A COMPUTATIONAL MODEL , 2011 .

[72]  Peter Redgrave,et al.  A computational model of action selection in the basal ganglia. II. Analysis and simulation of behaviour , 2001, Biological Cybernetics.

[73]  Giovanni Pezzulo,et al.  A spiking neuron model of the cortico-basal ganglia circuits for goal-directed and habitual action learning. , 2013, Neural networks : the official journal of the International Neural Network Society.