Predicting risk in a multiple stimulus-reward environment

Publisher Summary Recently, neurobiological evidence has emerged of risk prediction along the lines of simple reinforcement learning. This evidence, however, is limited to one-step-ahead risk, namely, the uncertainty involved in the next forecast or reward. However, it appears that multistep prediction risk is more relevant. Multiple stimuli may all predict the same future reward or a sequence of future rewards, and only the risk of the total reward is relevant. It is known that subjects are indeed interested in predicting the total reward (sum of the discounted future rewards), and that learning of the expected total reward accords with the temporal differencing algorithm. Complex extensions of simple reinforcement learning such as temporal difference (TD) learning have been proposed for situations where there are multiple stimuli and rewards; in those cases, the object to be learned is the expected total reward, namely, the expected value of the sum of discounted future rewards. This paper provides a mathematical rationale for the encoding of one-step-ahead prediction risks. It does so by exploring how TD learning could be implemented to learn risk in multiple-stimuli, multiple-rewards setting. The chapter illustrates total reward risk learning in a simple, but generic example, which is called the multistep risk example, and proposes how TD learning could be used to learn total reward risk. It is shown that TD learning is an effective way to learn one-step-ahead prediction risks, and from estimates of the latter, total reward prediction risk can be learned. Simulations illustrate how the proposed learning algorithm works.

[1]  A. Damasio,et al.  Insensitivity to future consequences following damage to human prefrontal cortex , 1994, Cognition.

[2]  M. Platt,et al.  Risk-sensitive neurons in macaque posterior cingulate cortex , 2005, Nature Neuroscience.

[3]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[4]  Lai-Wan Chan,et al.  Reward Adjustment Reinforcement Learning for Risk-averse Asset Allocation , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[5]  Joshua W. Brown,et al.  Learned Predictions of Error Likelihood in the Anterior Cingulate Cortex , 2005, Science.

[6]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[7]  G. McCarthy,et al.  Decisions under Uncertainty: Probabilistic Context Influences Activation of Prefrontal and Parietal Cortices , 2005, The Journal of Neuroscience.

[8]  K. Berman,et al.  Cerebral Cortex doi:10.1093/cercor/bhj004 Neural Coding of Distinct Statistical Properties of Reward Information in Humans , 2005 .

[9]  Terje Sagvolden,et al.  Behavioral and Brain Functions. A new journal , 2005, Behavioral and Brain Functions.

[10]  P. Bossaerts,et al.  Neurobiological studies of risk assessment: A comparison of expected utility and mean-variance approaches , 2008, Cognitive, affective & behavioral neuroscience.

[11]  K. Preuschoff,et al.  Adding Prediction Risk to the Theory of Reward Learning , 2007, Annals of the New York Academy of Sciences.

[12]  Ralph Neuneier,et al.  Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.

[13]  P. Dayan,et al.  Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[14]  R. Howard,et al.  Risk-Sensitive Markov Decision Processes , 1972 .

[15]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[16]  Joshua W. Brown,et al.  Risk prediction and aversion by anterior cingulate cortex , 2007, Cognitive, affective & behavioral neuroscience.

[17]  Camelia M. Kuhnen,et al.  The Neural Basis of Financial Risk Taking , 2005, Neuron.

[18]  S. Quartz,et al.  Neural Differentiation of Expected Reward and Risk in Human Subcortical Structures , 2006, Neuron.

[19]  Evan M. Gordon,et al.  Neural Signatures of Economic Preferences for Risk and Ambiguity , 2006, Neuron.

[20]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[21]  L. Cosmides,et al.  When and why do people avoid unknown probabilities in decisions under uncertainty? Testing some predictions from optimal foraging theory , 1999, Cognition.

[22]  Corianne Rogalsky,et al.  Increased activation in the right insula during risk-taking decision making is related to harm avoidance and neuroticism , 2003, NeuroImage.

[23]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[24]  Colin Camerer,et al.  Neural Systems Responding to Degrees of Uncertainty in Human Decision-Making , 2005, Science.

[25]  J. O'Doherty,et al.  Reward Value Coding Distinct From Risk Attitude-Related Uncertainty Coding in Human Reward Systems , 2006, Journal of neurophysiology.

[26]  S. Quartz,et al.  Human Insula Activation Reflects Risk Prediction Errors As Well As Risk , 2008, The Journal of Neuroscience.

[27]  E. Rolls,et al.  Cerebral Cortex Advance Access published June 22, 2007 Expected Value, Reward Outcome, and Temporal Difference Error Representations in a Probabilistic Decision Task , 2022 .

[28]  Brian Knutson,et al.  Neural Antecedents of Financial Decisions , 2007, The Journal of Neuroscience.

[29]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30]  Zhong-Lin Lu,et al.  Neural correlates of risk prediction error during reinforcement learning in humans , 2009, NeuroImage.