Modeling the Role of Striatum in Stochastic Multi Context Tasks

Decision making tasks in changing environments with probabilistic reward schemes present various challenges to the agents performing the task. These agents must use the experience gained in the past trials to characterize the environment which guides their actions. We present two models to predict an agent’s behavior in these tasks - a theoretical model which defines a Bayes optimal solution to the problem under realistic task conditions. The second is a computational model of the basal ganglia which presents a neural mechanism to solve the same. Both the models are shown to reproduce results in behavioral experiments and are compared to each other. This comparison allows us to characterize the theoretical model as a bound on the neural model and the neural model as a biologically plausible implementation of the theoretical model. Furthermore, we predict the performance of the agents in various stochastic regimes which could be tested in future studies.

[1]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[2]  B. Averbeck,et al.  Action Selection and Action Value in Frontal-Striatal Circuits , 2012, Neuron.

[3]  Jonathan D. Cohen,et al.  Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement , 2008, NIPS.

[4]  W. Schultz Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioural ecology , 2004, Current Opinion in Neurobiology.

[5]  V. Srinivasa Chakravarthy,et al.  What do the basal ganglia do? A modeling perspective , 2010, Biological Cybernetics.

[6]  E. Bézard,et al.  Shaping of Motor Responses by Incentive Values through the Basal Ganglia , 2007, The Journal of Neuroscience.

[7]  K. Lloyd,et al.  Context-dependent decision-making: a simple Bayesian model , 2013, Journal of The Royal Society Interface.

[8]  Robert C. Wilson,et al.  Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.

[9]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[10]  V. Srinivasa Chakravarthy,et al.  On the neural substrates for exploratory dynamics in basal ganglia: A model , 2012, Neural Networks.

[11]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[12]  E. Brunswik Probability as a determiner of rat behavior. , 1939 .

[13]  A. Flaherty,et al.  Input-output organization of the sensorimotor striatum in the squirrel monkey , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[14]  Peter Dayan,et al.  Expected and Unexpected Uncertainty: ACh and NE in the Neocortex , 2002, NIPS.

[15]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[16]  Michèle Sebag,et al.  Multi-armed Bandit, Dynamic Environments and Meta-Bandits , 2006 .

[17]  Ann M. Graybiel,et al.  Striosomes and Matrisomes , 1991 .

[18]  David V. Hinkley,et al.  Inference about the change-point in a sequence of binomial variables , 1970 .

[19]  D. Olton Mazes, maps, and memory. , 1979, The American psychologist.

[20]  Richard Granger,et al.  Engines of the Brain: The Computational Instruction Set of Human Cognition , 2006, AI Mag..

[21]  A. Graybiel,et al.  Highly restricted origin of prefrontal cortical inputs to striosomes in the macaque monkey , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[22]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Ken-ichi Amemori,et al.  Shifting Responsibly: The Importance of Striatal Modularity to Reinforcement Learning in Uncertain Environments , 2011, Front. Hum. Neurosci..

[25]  K. Doya Modulators of decision making , 2008, Nature Neuroscience.

[26]  Eric Moulines,et al.  On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.

[27]  G. Lorden PROCEDURES FOR REACTING TO A CHANGE IN DISTRIBUTION , 1971 .

[28]  Huanmian Chen,et al.  Recurrent Inhibitory Network among Striatal Cholinergic Interneurons , 2008, The Journal of Neuroscience.

[29]  J. Obeso,et al.  Functional neuroanatomy of the basal ganglia. , 2012, Cold Spring Harbor perspectives in medicine.

[30]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[31]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[32]  Michèle Sebag,et al.  Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments , 2007 .

[33]  V. S. Chakravarthy,et al.  A Biologically Plausible Architecture of the Striatum to Solve Context-Dependent Reinforcement Learning Tasks , 2017, Front. Neural Circuits.

[34]  A. Graybiel The basal ganglia: learning new tricks and loving it , 2005, Current Opinion in Neurobiology.

[35]  S. Charpier,et al.  In vivo activity-dependent plasticity at cortico-striatal connections: evidence for physiological long-term potentiation. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[36]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[37]  S. Panchapakesan,et al.  Inference about the Change-Point in a Sequence of Random Variables: A Selection Approach , 1988 .

[38]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.