Statistical Mechanics of Reward-Modulated Learning in Decision-Making Networks

The neural substrates of decision making have been intensively studied using experimental and computational approaches. Alternative-choice tasks accompanying reinforcement have often been employed in investigations into decision making. Choice behavior has been empirically found in many experiments to follow Herrnstein's matching law. A number of theoretical studies have been done on explaining the mechanisms responsible for matching behavior. Various learning rules have been proved in these studies to achieve matching behavior as a steady state of learning processes. The models in the studies have consisted of a few parameters. However, a large number of neurons and synapses are expected to participate in decision making in the brain. We investigated learning behavior in simple but large-scale decision-making networks. We considered the covariance learning rule, which has been demonstrated to achieve matching behavior as a steady state (Loewenstein & Seung, 2006). We analyzed model behavior in a thermodynamic limit where the number of plastic synapses went to infinity. By means of techniques of the statistical mechanics, we can derive deterministic differential equations in this limit for the order parameters, which allow an exact calculation of the evolution of choice behavior. As a result, we found that matching behavior cannot be a steady state of learning when the fluctuations in input from individual sensory neurons are so large that they affect the net input to value-encoding neurons. This situation naturally arises when the synaptic strength is sufficiently strong and the excitatory input and the inhibitory input to the value-encoding neurons are balanced. The deviation from matching behavior is caused by increasing variance in the input potential due to the diffusion of synaptic efficacies. This effect causes an undermatching phenomenon, which has been often observed in behavioral experiments.

[1]  A. Dean The variability of discharge of simple cells in the cat striate cortex , 2004, Experimental Brain Research.

[2]  Niels Bohr InstituteBlegdamsvej An Exactly Solvable Model of Unsupervised Learning , 2022 .

[3]  Haim Sompolinsky,et al.  Chaotic Balanced State in a Model of Cortical Circuits , 1998, Neural Computation.

[4]  H. Seung,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 581–617 NUMBER 3(NOVEMBER) LINEAR-NONLINEAR-POISSON MODELS OF PRIMATE CHOICE DYNAMICS , 2022 .

[5]  Yonatan Loewenstein,et al.  Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity , 2006, Proceedings of the National Academy of Sciences.

[6]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[7]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[8]  Xiao-Jing Wang Decision Making in Recurrent Neuronal Circuits , 2008, Neuron.

[9]  R. Kempter,et al.  Hebbian learning and spiking neurons , 1999 .

[10]  H. Sompolinsky,et al.  Chaos in Neuronal Networks with Balanced Excitatory and Inhibitory Activity , 1996, Science.

[11]  M. Farries,et al.  Reinforcement learning with modulated spike timing dependent synaptic plasticity. , 2007, Journal of neurophysiology.

[12]  Xiao-Jing Wang,et al.  Probabilistic Decision Making by Slow Reverberation in Cortical Circuits , 2002, Neuron.

[13]  Yutaka Sakai,et al.  When Does Reward Maximization Lead to Matching Law? , 2008, PloS one.

[14]  Masato Okada,et al.  Effects of Synaptic Weight Diffusion on Learning in Decision Making Networks , 2010, NIPS.

[15]  William R. Softky,et al.  The highly irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[16]  Stefano Fusi,et al.  Dynamical Regimes in Neural Network Models of Matching Behavior , 2013, Neural Computation.

[17]  Xiao-Jing Wang,et al.  A Biophysically Based Neural Model of Matching Law Behavior: Melioration by Stochastic Synapses , 2006, The Journal of Neuroscience.

[18]  Yutaka Sakai,et al.  The Actor-Critic Learning Is Behind the Matching Law: Matching Versus Optimal Behaviors , 2008, Neural Computation.

[19]  Razvan V. Florian,et al.  Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.

[20]  W. Newsome,et al.  Choosing the greater of two goods: neural currencies for valuation and decision making , 2005, Nature Reviews Neuroscience.

[21]  Robert A. Legenstein,et al.  A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback , 2008, PLoS Comput. Biol..

[22]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[23]  L. Abbott,et al.  Competitive Hebbian learning through spike-timing-dependent synaptic plasticity , 2000, Nature Neuroscience.

[24]  Timothy E. J. Behrens,et al.  Choice, uncertainty and value in prefrontal and cingulate cortex , 2008, Nature Neuroscience.

[25]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[26]  P Killeen,et al.  The matching law. , 1972, Journal of the experimental analysis of behavior.

[27]  W M Baum,et al.  On two types of deviation from the matching law: bias and undermatching. , 1974, Journal of the experimental analysis of behavior.

[28]  J. Gold,et al.  The neural basis of decision making. , 2007, Annual review of neuroscience.

[29]  Wolfgang Kinzel,et al.  Improving a Network Generalization Ability by Selecting Examples , 1990 .

[30]  Yonatan Loewenstein,et al.  Robustness of Learning That Is Based on Covariance-Driven Synaptic Plasticity , 2008, PLoS Comput. Biol..

[31]  R. Herrnstein,et al.  CHAPTER 5 – Melioration and Behavioral Allocation1 , 1980 .

[32]  Timothy D. Hanks,et al.  Probabilistic Population Codes for Bayesian Decision Making , 2008, Neuron.

[33]  Xiao-Jing Wang,et al.  Neural mechanism for stochastic behaviour during a competitive game , 2006, Neural Networks.

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35]  L. Abbott,et al.  Cascade Models of Synaptically Stored Memories , 2005, Neuron.

[36]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[37]  E. Miller,et al.  A Neural Circuit Model of Flexible Sensorimotor Mapping: Learning and Forgetting on Multiple Timescales , 2007, Neuron.

[38]  S. Royer,et al.  Conservation of total synaptic weight through balanced synaptic depression and potentiation , 2003, Nature.

[39]  C. Law,et al.  Reinforcement learning can account for associative and perceptual learning on a visual decision task , 2009, Nature Neuroscience.

[40]  R. Urbanczik,et al.  SELF-AVERAGING AND ON-LINE LEARNING , 1998, cond-mat/9805339.

[41]  Stefano Fusi,et al.  Hebbian spike-driven synaptic plasticity for learning patterns of mean firing rates , 2002, Biological Cybernetics.