The Accounting Review

convergence of a covariance plasticity rule to a fixed point results in matching behavior (Loewenstein and Seung, 2006; Loewenstein, 2008a). This result is independent of the architecture of the decision making network, the properties of the constituting neurons or the specifics of the covariance plasticity rule. The universality of the relation between the fixed-point solution of the covariance synaptic plasticity rule and the matching law of behavior raises the question of whether there are aspects of the dynamics of convergence to the matching law that are also universal. In this paper I study the transient learning dynamics of a general decision making network in which changes in synaptic efficacies are driven by the covariance between reward and neural activity. I examine the two-alternative repeated-choice schedule which is typically used in human and animal experiments. I show that the macroscopic behavioral learning dynamics that result from the microscopic synaptic covariance plasticity rule are also general and follow the well known Replicator equation. This result is independent of the decision-making network architecture, the properties of the neurons and the specifics of the plasticity rule. These only determine the learning rate in the behavioral learning equation. By analyzing several examples, I show that in these examples, the learning rate depends on the probabilities of choice: it is approximately proportional to the product of the probabilities of choice raised to a power, where the power depends on the specifics of the model. Some of the findings presented here have appeared previously in abstract form (Loewenstein, 2008b).

[1]  M. Farries,et al.  Reinforcement learning with modulated spike timing dependent synaptic plasticity. , 2007, Journal of neurophysiology.

[2]  Ron Meir,et al.  Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule , 2007, Neural Computation.

[3]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[4]  Razvan V. Florian,et al.  Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.

[5]  Yonatan Loewenstein,et al.  Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity , 2006, Proceedings of the National Academy of Sciences.

[6]  Jonathan D. Cohen,et al.  The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. , 2006, Psychological review.

[7]  Xiao-Jing Wang,et al.  A Biophysically Based Neural Model of Matching Law Behavior: Melioration by Stochastic Synapses , 2006, The Journal of Neuroscience.

[8]  Ila R Fiete,et al.  Gradient learning in spiking neural networks by dynamic perturbation of conductances. , 2006, Physical review letters.

[9]  P. Glimcher Indeterminacy in brain and behavior. , 2005, Annual review of psychology.

[10]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[11]  Xiaohui Xie,et al.  Learning in neural networks by reinforcement of irregular spiking. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  D. Barraclough,et al.  Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[13]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[14]  Xiao-Jing Wang,et al.  Probabilistic Decision Making by Slow Reverberation in Cortical Circuits , 2002, Neuron.

[15]  D. Shanks,et al.  A Re-examination of Probability Matching and Rational Choice , 2002 .

[16]  A. Guz Elastic Waves in Bodies with Initial (Residual) Stresses , 2002 .

[17]  C. Gallistel,et al.  The rat approximates an ideal detector of changes in rates of reward: implications for the law of effect. , 2001, Journal of experimental psychology. Animal behavior processes.

[18]  Nir Vulkan An Economist's Perspective on Probability Matching , 2000 .

[19]  R. Kempter,et al.  Hebbian learning and spiking neurons , 1999 .

[20]  D. W. Hands The Matching Law: Papers In Psychology And Economics , 1999 .

[21]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[22]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[23]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[24]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[25]  A. Guz,et al.  Elastic waves in prestressed bodies interacting with a fluid (survey) , 1997 .

[26]  Hilbert J. Kappen,et al.  On-line learning processes in artificial neural networks , 1993 .

[27]  R. Herrnstein,et al.  Melioration: A Theory of Distributed Choice , 1991 .

[28]  Michael I. Jordan,et al.  A more biologically plausible learning rule for neural networks. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[30]  M. Davison,et al.  The matching law: A research review. , 1988 .

[31]  Aleksandr Nikolaevich Guzʹ Упругие волны в телах с начальными напряжениями , 1986 .

[32]  A. Guz Aerohydroelasticity problems for bodies with initial stresses , 1980 .

[33]  A. N. Guz',et al.  Elastic waves in bodies with initial stresses , 1979 .

[34]  J. Cross A Stochastic Learning Model of Economic Behavior , 1973 .

[35]  R J HERRNSTEIN,et al.  Relative and absolute strength of response as a function of frequency of reinforcement. , 1961, Journal of the experimental analysis of behavior.

[36]  W. Brown Animal Intelligence: Experimental Studies , 1912, Nature.