Learning of sequential movements by neural network model with dopamine-like reinforcement signal

Abstract Dopamine neurons appear to code an error in the prediction of reward. They are activated by unpredicted rewards, are not influenced by predicted rewards, and are depressed when a predicted reward is omitted. After conditioning, they respond to reward-predicting stimuli in a similar manner. With these characteristics, the dopamine response strongly resembles the predictive reinforcement teaching signal of neural network models implementing the temporal difference learning algorithm. This study explored a neural network model that used a reward-prediction error signal strongly resembling dopamine responses for learning movement sequences. A different stimulus was presented in each step of the sequence and required a different movement reaction, and reward occurred at the end of the correctly performed sequence. The dopamine-like predictive reinforcement signal efficiently allowed the model to learn long sequences. By contrast, learning with an unconditional reinforcement signal required synaptic eligibility traces of longer and biologically less-plausible durations for obtaining satisfactory performance. Thus, dopamine-like neuronal signals constitute excellent teaching signals for learning sequential behavior.

[1]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[2]  Masataka Watanabe,et al.  Prefrontal and cingulate unit activity during timing behavior in the monkey , 1979, Brain Research.

[3]  A. Dickinson Contemporary Animal Learning Theory , 1981 .

[4]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  C. Marsden,et al.  Disturbance of sequential movements in patients with Parkinson's disease. , 1987, Brain : a journal of neurology.

[6]  S. Grossberg,et al.  Neural dynamics of attentionally modulated Pavlovian conditioning: Conditioned reinforcement, inhibition, and opponent processing , 1987, Psychobiology.

[7]  H Nishijo,et al.  Single neuron responses in amygdala of alert monkey during complex sensory stimulation with affective significance , 1988, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[8]  O. Hikosaka,et al.  Functional properties of monkey caudate neurons. III. Activities related to expectation of target and reward. , 1989, Journal of neurophysiology.

[9]  Masataka Watanabe,et al.  The appropriateness of behavioral responses coded in post-trial activity of primate prefrontal units , 1989, Neuroscience Letters.

[10]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[11]  P. Calabresi,et al.  Long‐term Potentiation in the Striatum is Unmasked by Removing the Voltage‐dependent Magnesium Block of NMDA Receptor Channels , 1992, The European journal of neuroscience.

[12]  Terrence J. Sejnowski,et al.  Using Aperiodic Reinforcement for Directed Self-Organization During Development , 1992, NIPS.

[13]  R Iansek,et al.  Motor functions of the basal ganglia , 1993, Psychological research.

[14]  W. Schultz,et al.  Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[15]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[16]  James C. Houk,et al.  Context-dependent Activity in Primate Striatum Reflecting Past and Future Behavioral Events , 1994 .

[17]  Karl J. Friston,et al.  Value-dependent selection in the brain: Simulation in a synthetic neural model , 1994, Neuroscience.

[18]  W. Schultz,et al.  Importance of unpredictability for reward responses in primate dopamine neurons. , 1994, Journal of neurophysiology.

[19]  T. Sejnowski,et al.  The predictive brain: temporal coincidence and temporal order in synaptic learning mechanisms. , 1994, Learning & memory.

[20]  W. Schultz,et al.  Context-dependent activity in primate striatum reflecting past and future behavioral events. , 1995 .

[21]  Peter Ford Dominey,et al.  A Model of Corticostriatal Plasticity for Learning Oculomotor Associations and Sequences , 1995, Journal of Cognitive Neuroscience.

[22]  A. Barto,et al.  Adaptive Critics and the Basal Ganglia , 1994 .

[23]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[24]  H Mushiake,et al.  Pallidal neuron activity during sequential arm movements. , 1995, Journal of neurophysiology.

[25]  J. Wickens,et al.  Cellular models of reinforcement. , 1995 .

[26]  J. Wickens,et al.  Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex In vitro , 1996, Neuroscience.

[27]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[28]  P. Calabresi,et al.  Abnormal Synaptic Plasticity in the Striatum of Mice Lacking Dopamine D2 Receptors , 1997, The Journal of Neuroscience.

[29]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[30]  O. Hikosaka,et al.  Differential roles of monkey striatum in learning of sequential hand movement , 1997, Experimental Brain Research.