Adaptive properties of differential learning rates for positive and negative outcomes

The concept of the reward prediction error—the difference between reward obtained and reward predicted—continues to be a focal point for much theoretical and experimental work in psychology, cognitive science, and neuroscience. Models that rely on reward prediction errors typically assume a single learning rate for positive and negative prediction errors. However, behavioral data indicate that better-than-expected and worse-than-expected outcomes often do not have symmetric impacts on learning and decision-making. Furthermore, distinct circuits within cortico-striatal loops appear to support learning from positive and negative prediction errors, respectively. Such differential learning rates would be expected to lead to biased reward predictions and therefore suboptimal choice performance. Contrary to this intuition, we show that on static “bandit” choice tasks, differential learning rates can be adaptive. This occurs because asymmetric learning enables a better separation of learned reward probabilities. We show analytically how the optimal learning rate asymmetry depends on the reward distribution and implement a biologically plausible algorithm that adapts the balance of positive and negative learning rates from experience. These results suggest specific adaptive advantages for separate, differential learning rates in simple reinforcement learning settings and provide a novel, normative perspective on the interpretation of associated neural data.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[3]  Kenji Doya,et al.  Meta-learning in Reinforcement Learning , 2003, Neural Networks.

[4]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[5]  Michael J. Frank,et al.  Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning , 2007, Proceedings of the National Academy of Sciences.

[6]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[7]  Y. Niv,et al.  Learning latent structure: carving nature at its joints , 2010, Current Opinion in Neurobiology.

[8]  Ralph Neuneier,et al.  Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.

[9]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[10]  Luke Clark,et al.  Gambling Severity Predicts Midbrain Response to Near-Miss Outcomes , 2010, The Journal of Neuroscience.

[11]  Zeb Kurth-Nelson,et al.  Temporal-Difference Reinforcement Learning with Distributed Representations , 2009, PloS one.

[12]  T. Sharot The optimism bias , 2011, Current Biology.

[13]  Angela J. Yu,et al.  Adaptive Behavior: Humans Act as Bayesian Learners , 2007, Current Biology.

[14]  Matthijs A. A. van der Meer,et al.  Information Processing in Decision-Making Systems , 2012, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[15]  C. Gerfen,et al.  D1 and D2 dopamine receptor-regulated gene expression of striatonigral and striatopallidal neurons. , 1990, Science.

[16]  Jonathan D. Cohen,et al.  The Expected Value of Control: An Integrative Theory of Anterior Cingulate Cortex Function , 2013, Neuron.

[17]  J. O'Doherty,et al.  Model‐Based fMRI and Its Application to Reward Learning and Decision Making , 2007, Annals of the New York Academy of Sciences.

[18]  M. Frank,et al.  Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. , 2009, Nature neuroscience.

[19]  Ethan S. Bromberg-Martin,et al.  Dopamine in Motivational Control: Rewarding, Aversive, and Alerting , 2010, Neuron.

[20]  P. Dayan,et al.  Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[21]  A. Redish,et al.  Addiction as a Computational Process Gone Awry , 2004, Science.

[22]  M. Khamassi,et al.  Dopaminergic Control of the Exploration-Exploitation Trade-Off via the Basal Ganglia , 2012, Front. Neurosci..

[23]  Peter Ford Dominey,et al.  Robot Cognitive Control with a Neurophysiologically Inspired Reinforcement Learning Model , 2011, Front. Neurorobot..

[24]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[25]  W. Schultz Behavioral theories and the neurophysiology of reward. , 2006, Annual review of psychology.

[26]  C. Fiorillo Two Dimensions of Value: Dopamine Neurons Represent Reward But Not Aversiveness , 2013, Science.

[27]  John J. B. Allen,et al.  Social stress reactivity alters reward and punishment learning. , 2011, Social cognitive and affective neuroscience.

[28]  Jadin C. Jackson,et al.  Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. , 2007, Psychological review.

[29]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[30]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[31]  Christoph W Korn,et al.  How unrealistic optimism is maintained in the face of reality , 2011, Nature Neuroscience.

[32]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[33]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[34]  A. Tversky,et al.  Prospect theory: an analysis of decision under risk — Source link , 2007 .

[35]  Anatol C. Kreitzer,et al.  Distinct roles for direct and indirect pathway striatal neurons in reinforcement , 2012, Nature Neuroscience.

[36]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[37]  Anthony A. Grace,et al.  Dopamine system dysregulation by the hippocampus: Implications for the pathophysiology and treatment of schizophrenia , 2012, Neuropharmacology.

[38]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[39]  Peter Ford Dominey,et al.  Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters. , 2013, Progress in brain research.

[40]  P. Bossaerts,et al.  Neurobiological studies of risk assessment: A comparison of expected utility and mean-variance approaches , 2008, Cognitive, affective & behavioral neuroscience.

[41]  M. Frank,et al.  From reinforcement learning models to psychiatric and neurological disorders , 2011, Nature Neuroscience.

[42]  M. Frank,et al.  Instructional control of reinforcement learning: A behavioral and neurocomputational investigation , 2009, Brain Research.

[43]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.