Dynamical Regimes in Neural Network Models of Matching Behavior

The matching law constitutes a quantitative description of choice behavior that is often observed in foraging tasks. According to the matching law, organisms distribute their behavior across available response alternatives in the same proportion that reinforcers are distributed across those alternatives. Recently a few biophysically plausible neural network models have been proposed to explain the matching behavior observed in the experiments. Here we study systematically the learning dynamics of these networks while performing a matching task on the concurrent variable interval (VI) schedule. We found that the model neural network can operate in one of three qualitatively different regimes depending on the parameters that characterize the synaptic dynamics and the reward schedule: (1) a matching behavior regime, in which the probability of choosing an option is roughly proportional to the baiting fractional probability of that option; (2) a perseverative regime, in which the network tends to make always the same decision; and (3) a tristable regime, in which the network can either perseverate or choose the two targets randomly approximately with the same probability. Different parameters of the synaptic dynamics lead to different types of deviations from the matching law, some of which have been observed experimentally. We show that the performance of the network depends on the number of stable states of each synapse and that bistable synapses perform close to optimal when the proper learning rate is chosen. Because our model provides a link between synaptic dynamics and qualitatively different behaviors, this work provides us with insight into the effects of neuromodulators on adaptive behaviors and psychiatric disorders.

[1]  C. Gallistel Foraging for brain stimulation: toward a neurobiology of computation , 1994, Cognition.

[2]  L. Abbott,et al.  Cascade Models of Synaptically Stored Memories , 2005, Neuron.

[3]  Xiao-Jing Wang,et al.  A Biophysically Based Neural Model of Matching Law Behavior: Melioration by Stochastic Synapses , 2006, The Journal of Neuroscience.

[4]  R J HERRNSTEIN,et al.  Relative and absolute strength of response as a function of frequency of reinforcement. , 1961, Journal of the experimental analysis of behavior.

[5]  H. Seung,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 581–617 NUMBER 3(NOVEMBER) LINEAR-NONLINEAR-POISSON MODELS OF PRIMATE CHOICE DYNAMICS , 2022 .

[6]  Yutaka Sakai,et al.  The Actor-Critic Learning Is Behind the Matching Law: Matching Versus Optimal Behaviors , 2008, Neural Computation.

[7]  Xiao-Jing Wang,et al.  Probabilistic Decision Making by Slow Reverberation in Cortical Circuits , 2002, Neuron.

[8]  Masato Okada,et al.  Statistical Mechanics of Reward-Modulated Learning in Decision-Making Networks , 2012, Neural Computation.

[9]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[10]  Wulfram Gerstner,et al.  Tag-Trigger-Consolidation: A Model of Early and Late Long-Term-Potentiation and Depression , 2008, PLoS Comput. Biol..

[11]  Yutaka Sakai,et al.  When Does Reward Maximization Lead to Matching Law? , 2008, PloS one.

[12]  D. Barraclough,et al.  Reinforcement learning and decision making in monkeys during a competitive game. , 2004, Brain research. Cognitive brain research.

[13]  R. Herrnstein,et al.  The Matching Law Papers in Psychology and Economics , 1997 .

[14]  Robert C. Wilson,et al.  An Approximately Bayesian Delta-Rule Model Explains the Dynamics of Belief Updating in a Changing Environment , 2010, The Journal of Neuroscience.

[15]  S Fusi,et al.  Forming classes by stimulus frequency: Behavior and theory , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Yonatan Loewenstein,et al.  Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity , 2006, Proceedings of the National Academy of Sciences.

[17]  Stefano Fusi,et al.  Efficient Partitioning of Memory Systems and Its Importance for Memory Consolidation , 2013, PLoS Comput. Biol..

[18]  Yonatan Loewenstein,et al.  Synaptic Theory of Replicator-Like Melioration , 2010, Front. Comput. Neurosci..

[19]  P. Glimcher,et al.  Value Representations in the Primate Striatum during Matching Behavior , 2008, Neuron.

[20]  E. Miller,et al.  A Neural Circuit Model of Flexible Sensorimotor Mapping: Learning and Forgetting on Multiple Timescales , 2007, Neuron.

[21]  Yonatan Loewenstein,et al.  Robustness of Learning That Is Based on Covariance-Driven Synaptic Plasticity , 2008, PLoS Comput. Biol..

[22]  Stefano Fusi,et al.  Synaptic encoding of temporal contiguity , 2013, Front. Comput. Neurosci..

[23]  Robert C. Wilson,et al.  Rational regulation of learning dynamics by pupil–linked arousal systems , 2012, Nature Neuroscience.

[24]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[25]  Jonathan D. Cohen,et al.  Explicit melioration by a neural diffusion model , 2009, Brain Research.

[26]  M. Gluck,et al.  Dopaminergic Drugs Modulate Learning Rates and Perseveration in Parkinson's Patients in a Dynamic Foraging Task , 2009, The Journal of Neuroscience.

[27]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[28]  C. Gallistel,et al.  The rat approximates an ideal detector of changes in rates of reward: implications for the law of effect. , 2001, Journal of experimental psychology. Animal behavior processes.

[29]  H. Sebastian Seung,et al.  Operant Matching as a Nash Equilibrium of an Intertemporal Game , 2009, Neural Computation.

[30]  Xiao-Jing Wang,et al.  Neural mechanism for stochastic behaviour during a competitive game , 2006, Neural Networks.

[31]  Y. Loewenstein,et al.  Covariance-Based Synaptic Plasticity in an Attractor Network Model Accounts for Fast Adaptation in Free Operant Learning , 2013, The Journal of Neuroscience.

[32]  L. Abbott,et al.  Limits on the memory storage capacity of bounded synapses , 2007, Nature Neuroscience.