Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity

The probability of choosing an alternative in a long sequence of repeated choices is proportional to the total reward derived from that alternative, a phenomenon known as Herrnstein's matching law. This behavior is remarkably conserved across species and experimental conditions, but its underlying neural mechanisms still are unknown. Here, we propose a neural explanation of this empirical law of behavior. We hypothesize that there are forms of synaptic plasticity driven by the covariance between reward and neural activity and prove mathematically that matching is a generic outcome of such plasticity. Two hypothetical types of synaptic plasticity, embedded in decision-making neural network models, are shown to yield matching behavior in numerical simulations, in accord with our general theorem. We show how this class of models can be tested experimentally by making reward not only contingent on the choices of the subject but also directly contingent on fluctuations in neural activity. Maximization is shown to be a generic outcome of synaptic plasticity driven by the sum of the covariances between reward and all past neural activities.

[1]  永福 智志 The Organization of Learning , 2005, Journal of Cognitive Neuroscience.

[2]  W. Schultz Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioural ecology , 2004, Current Opinion in Neurobiology.

[3]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[4]  Frederick Mosteller,et al.  Stochastic Models for Learning , 1956 .

[5]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[6]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[7]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[8]  Andreas Ortmann,et al.  Behavioral Game Theory, Colin F. Camerer, 2003, Russell Sage Foundation, New York, New York/Princeton University Press, Princeton, New Jersey, hardcover, 544 pages, ISBN:0691090394, $65.00 , 2004 .

[9]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[10]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[11]  M. Davison,et al.  The matching law: A research review. , 1988 .

[12]  John N. J. Reynolds,et al.  Dopamine-dependent plasticity of corticostriatal synapses , 2002, Neural Networks.

[13]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[14]  Xiao-Jing Wang,et al.  Probabilistic Decision Making by Slow Reverberation in Cortical Circuits , 2002, Neuron.

[15]  J. Cross A Stochastic Learning Model of Economic Behavior , 1973 .

[16]  E. Kandel,et al.  Is Heterosynaptic modulation essential for stabilizing hebbian plasiticity and memory , 2000, Nature Reviews Neuroscience.

[17]  C. Gallistel The organization of learning , 1990 .

[18]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[19]  D. Wilkin,et al.  Neuron , 2001, Brain Research.

[20]  Xiao-Jing Wang,et al.  A Biophysically Based Neural Model of Matching Law Behavior: Melioration by Stochastic Synapses , 2006, The Journal of Neuroscience.

[21]  M. Benaïm,et al.  Deterministic Approximation of Stochastic Evolution in Games , 2003 .

[22]  S. Gächter Behavioral Game Theory , 2008, Encyclopedia of Evolutionary Psychological Science.

[23]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[24]  R. Herrnstein,et al.  Melioration: A Theory of Distributed Choice , 1991 .

[25]  P. Glimcher,et al.  Activity in Posterior Parietal Cortex Is Correlated with the Relative Subjective Desirability of Action , 2004, Neuron.

[26]  P. Glimcher Indeterminacy in brain and behavior. , 2005, Annual review of psychology.

[27]  W. Schultz Predictive reward signal of dopamine neurons. , 1998, Journal of neurophysiology.

[28]  Sham M. Kakade,et al.  Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[29]  C. Gallistel,et al.  The rat approximates an ideal detector of changes in rates of reward: implications for the law of effect. , 2001, Journal of experimental psychology. Animal behavior processes.

[30]  D. Barraclough,et al.  Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[31]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[32]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[33]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[34]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[35]  E. Vaadia,et al.  Coincident but Distinct Messages of Midbrain Dopamine and Striatal Tonically Active Neurons , 2004, Neuron.

[36]  H. Seung,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 581–617 NUMBER 3(NOVEMBER) LINEAR-NONLINEAR-POISSON MODELS OF PRIMATE CHOICE DYNAMICS , 2022 .

[37]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[38]  E. Fetz Operant Conditioning of Cortical Unit Activity , 1969, Science.

[39]  D. Shanks,et al.  A Re-examination of Probability Matching and Rational Choice , 2002 .

[40]  P Killeen,et al.  The matching law. , 1972, Journal of the experimental analysis of behavior.