Adding Prediction Risk to the Theory of Reward Learning

Abstract:  This article analyzes the simple Rescorla–Wagner learning rule from the vantage point of least squares learning theory. In particular, it suggests how measures of risk, such as prediction risk, can be used to adjust the learning constant in reinforcement learning. It argues that prediction risk is most effectively incorporated by scaling the prediction errors. This way, the learning rate needs adjusting only when the covariance between optimal predictions and past (scaled) prediction errors changes. Evidence is discussed that suggests that the dopaminergic system in the (human and nonhuman) primate brain encodes prediction risk, and that prediction errors are indeed scaled with prediction risk (adaptive encoding).

[1]  H. Markowitz,et al.  Mean-Variance versus Direct Utility Maximization , 1984 .

[2]  Masanao Aoki,et al.  State Space Modeling of Time Series , 1987 .

[3]  M. Ma,et al.  FOUNDATIONS OF PORTFOLIO THEORY , 1990 .

[4]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[5]  F. Flynn,et al.  Anatomy of the insula functional and clinical correlates , 1999 .

[6]  Peter Dayan,et al.  Expected and Unexpected Uncertainty: ACh and NE in the Neocortex , 2002, NIPS.

[7]  Charles A. Holt,et al.  Risk Aversion and Incentive Effects , 2002 .

[8]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[9]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[10]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[11]  Jonathan D. Cohen,et al.  Computational roles for dopamine in behavioural control , 2004, Nature.

[12]  W. Schultz Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioural ecology , 2004, Current Opinion in Neurobiology.

[13]  M. Platt,et al.  Risk-sensitive neurons in macaque posterior cingulate cortex , 2005, Nature Neuroscience.

[14]  Camelia M. Kuhnen,et al.  The Neural Basis of Financial Risk Taking , 2005, Neuron.

[15]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[16]  G. McCarthy,et al.  Decisions under Uncertainty: Probabilistic Context Influences Activation of Prefrontal and Parietal Cortices , 2005, The Journal of Neuroscience.

[17]  P. Dayan,et al.  Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[18]  W. Schultz,et al.  Behavioral and Brain Functions , 2005 .

[19]  S. Quartz,et al.  Neural Differentiation of Expected Reward and Risk in Human Subcortical Structures , 2006, Neuron.

[20]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[21]  Tobi Delbruck,et al.  New Encyclopedia of Neuroscience , 2008 .