Reinforcement learning: Computational theory and biological mechanisms

Reinforcement learning is a computational framework for an active agent to learn behaviors on the basis of a scalar reward signal. The agent can be an animal, a human, or an artificial system such as a robot or a computer program. The reward can be food, water, money, or whatever measure of the performance of the agent. The theory of reinforcement learning, which was developed in an artificial intelligence community with intuitions from animal learning theory, is now giving a coherent account on the function of the basal ganglia. It now serves as the “common language” in which biologists, engineers, and social scientists can exchange their problems and findings. This article reviews the basic theoretical framework of reinforcement learning and discusses its recent and future contributions toward the understanding of animal behaviors and human decision making.

[1]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[2]  Kenji Doya,et al.  The Cyber Rodent Project: Exploration of Adaptive Mechanisms for Self-Preservation and Self-Reproduction , 2005, Adapt. Behav..

[3]  Mitsuo Kawato,et al.  Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[4]  Eve Marder,et al.  Cellular, synaptic and network effects of neuromodulation , 2002, Neural Networks.

[5]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[6]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[7]  Terrence J. Sejnowski,et al.  Exploration Bonuses and Dual Control , 1996, Machine Learning.

[8]  Saori C. Tanaka,et al.  Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[9]  E. Thorndike “Animal Intelligence” , 1898, Nature.

[10]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[11]  T. Robbins,et al.  Double dissociation between serotonergic and dopaminergic modulation of medial prefrontal and orbitofrontal cortex during a test of impulsive choice. , 2006, Cerebral cortex.

[12]  O. Hikosaka,et al.  Dopamine Neurons Can Represent Context-Dependent Prediction Error , 2004, Neuron.

[13]  J. Wickens,et al.  Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex In vitro , 1996, Neuroscience.

[14]  James C. Houk,et al.  Agents of the mind , 2005, Biological Cybernetics.

[15]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[16]  P. Glimcher,et al.  Activity in Posterior Parietal Cortex Is Correlated with the Relative Subjective Desirability of Action , 2004, Neuron.

[17]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[18]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[19]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[20]  W. Schultz,et al.  Learning of sequential movements by neural network model with dopamine-like reinforcement signal , 1998, Experimental Brain Research.

[21]  D. Knill,et al.  The Bayesian brain: the role of uncertainty in neural coding and computation , 2004, Trends in Neurosciences.

[22]  Sham M. Kakade,et al.  Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[23]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[24]  P. Redgrave,et al.  The short-latency dopamine signal: a role in discovering novel actions? , 2006, Nature Reviews Neuroscience.

[25]  Kae Nakamura,et al.  Basal ganglia orient eyes to reward. , 2006, Journal of neurophysiology.

[26]  J. Wickens,et al.  A cellular mechanism of reward-related learning , 2001, Nature.

[27]  B. Balleine,et al.  Parallel incentive processing: an integrated view of amygdala function , 2006, Trends in Neurosciences.

[28]  Jonathan D. Cohen,et al.  Neuroeconomics: cross-currents in research on decision-making , 2006, Trends in Cognitive Sciences.

[29]  M. Hallett,et al.  Functional properties of brain areas associated with motor execution and imagery. , 2003, Journal of neurophysiology.

[30]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[31]  M. Hallett,et al.  Cerebral Processes Related to Visuomotor Imagery and Generation of Simple Finger Movements Studied with Positron Emission Tomography , 1998, NeuroImage.

[32]  P. Goldman-Rakic,et al.  The role of D1-dopamine receptor in working memory: local injections of dopamine antagonists into the prefrontal cortex of rhesus monkeys performing an oculomotor delayed-response task. , 1994, Journal of neurophysiology.

[33]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[34]  Rajesh P. N. Rao,et al.  Bayesian brain : probabilistic approaches to neural coding , 2006 .

[35]  Michael L. Platt,et al.  Neural correlates of decision variables in parietal cortex , 1999, Nature.

[36]  José Luis Contreras-Vidal,et al.  A Predictive Reinforcement Model of Dopamine Neurons for Learning Approach Behavior , 1999, Journal of Computational Neuroscience.

[37]  G. Reeke Marvin Minsky, The Society of Mind , 1991, Artif. Intell..

[38]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[39]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[40]  W. Schultz,et al.  Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.

[41]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[42]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[43]  A. Barto,et al.  Adaptive Critics and the Basal Ganglia , 1994 .

[44]  Xiao-Jing Wang,et al.  Neural mechanism for stochastic behaviour during a competitive game , 2006, Neural Networks.

[45]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[46]  J. Hollerman,et al.  Reward processing in primate orbitofrontal cortex and basal ganglia. , 2000, Cerebral cortex.

[47]  J. Tanji,et al.  Numerical representation for action in the parietal cortex of the monkey , 2002, Nature.

[48]  D. Barraclough,et al.  Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[49]  O. Hikosaka,et al.  Reward-predicting activity of dopamine and caudate neurons--a possible mechanism of motivational control of saccadic eye movement. , 2004, Journal of neurophysiology.

[50]  Keiji Tanaka,et al.  Neuronal Correlates of Goal-Based Motor Selection in the Prefrontal Cortex , 2003, Science.

[51]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[52]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[53]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[54]  P. Strick,et al.  The cerebellum communicates with the basal ganglia , 2005, Nature Neuroscience.

[55]  K. Doya Complementary roles of basal ganglia and cerebellum in learning and motor control , 2000, Current Opinion in Neurobiology.

[56]  John N. J. Reynolds,et al.  Dopamine-dependent plasticity of corticostriatal synapses , 2002, Neural Networks.

[57]  S. Ishii,et al.  Resolution of Uncertainty in Prefrontal Cortex , 2006, Neuron.

[58]  S P Wise,et al.  Distributed modular architectures linking basal ganglia, cerebellum, and cerebral cortex: their role in planning and controlling action. , 1995, Cerebral cortex.

[59]  P. Glimcher,et al.  Neuroeconomics: The Consilience of Brain and Decision , 2004, Science.

[60]  O. Hikosaka,et al.  Expectation of reward modulates cognitive signals in the basal ganglia , 1998, Nature Neuroscience.

[61]  Tatsuo K Sato,et al.  Correlated Coding of Motivation and Outcome of Decision by Dopamine Neurons , 2003, The Journal of Neuroscience.

[62]  Masataka Watanabe Reward expectancy in primate prefrental neurons , 1996, Nature.

[63]  K. Doya,et al.  The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[64]  A. Dickinson,et al.  Reward-related signals carried by dopamine neurons. , 1995 .

[65]  D. Barraclough,et al.  Reinforcement learning and decision making in monkeys during a competitive game. , 2004, Brain research. Cognitive brain research.

[66]  D. Barraclough,et al.  Learning and decision making in monkeys during a rock-paper-scissors game. , 2005, Brain research. Cognitive brain research.

[67]  J. Wickens,et al.  Substantia nigra dopamine regulates synaptic plasticity and membrane potential fluctuations in the rat neostriatum, in vivo , 2000, Neuroscience.