The functional form of value normalization in human reinforcement learning

Reinforcement learning research in humans and other species indicates that rewards are represented in a context-dependent manner. More specifically, reward representations seem to be normalized as a function of the value of the alternative options. The dominant view postulates that value context-dependence is achieved via a divisive normalization rule, inspired by perceptual decision-making research. However, behavioral and neural evidence points to another plausible mechanism: range normalization. Critically, previous experimental designs were ill-suited to disentangle the divisive and the range normalization accounts, which generate similar behavioral predictions in many circumstances. To address this question, we designed a new learning task where we manipulated, across learning contexts, the number of options and the value ranges. Behavioral and computational analyses falsify the divisive normalization account and rather provide support for the range normalization rule. Together, these results shed new light on the computational mechanisms underlying context-dependence in learning and decision-making.

[1]  Ryan Webb,et al.  A test of attribute normalization via a double decoy effect , 2023, Journal of Mathematical Psychology.

[2]  A. Barron,et al.  Bumblebees retrieve only the ordinal ranking of foraging options when comparing memories obtained in distinct settings , 2022, eLife.

[3]  Adam Brandenburger,et al.  Divisive normalization is an efficient code for multivariate Pareto-distributed environments , 2022, Proceedings of the National Academy of Sciences of the United States of America.

[4]  K. Tsetsos,et al.  On multiple sources of value sensitivity , 2022, Proceedings of the National Academy of Sciences of the United States of America.

[5]  William M. Hayes,et al.  Reinforcement learning in and out of context: The effects of attentional focus. , 2022, Journal of experimental psychology. Learning, memory, and cognition.

[6]  Keno Juechems,et al.  Human value learning and representation reflect rational adaptation to task demands , 2022, Nature Human Behaviour.

[7]  P. Glimcher Efficiently irrational: deciphering the riddle of human choice , 2022, Trends in Cognitive Sciences.

[8]  K. Louie Asymmetric and adaptive reward coding via normalized reinforcement learning , 2021, bioRxiv.

[9]  Thorsten Pachur,et al.  Nonlinear Probability Weighting Can Reflect Attentional Biases in Sequential Sampling , 2021, CogSci.

[10]  D. Shohamy,et al.  Memory and decision making interact to shape the value of unchosen options , 2021, Nature Communications.

[11]  M. Lebreton,et al.  Context-dependent outcome encoding in human reinforcement learning , 2021, Current Opinion in Behavioral Sciences.

[12]  A. Rustichini,et al.  Two sides of the same coin: Beneficial and detrimental consequences of range adaptation in human reinforcement learning , 2021, Science Advances.

[13]  Ryan Webb,et al.  Divisive normalization does influence decisions with multiple alternatives , 2020, Nature Human Behaviour.

[14]  P. Haggard,et al.  Information about action outcomes differentially affects learning from self-determined versus imposed choices , 2020, Nature Human Behaviour.

[15]  Stefano Palminteri,et al.  The description–experience gap: a challenge for the neuroeconomics of decision-making under uncertainty , 2020, Philosophical Transactions of the Royal Society B.

[16]  Sebastian Gluth,et al.  Value-based attention but not divisive normalization influences decisions with multiple alternatives , 2020, Nature Human Behaviour.

[17]  Stefano Palminteri,et al.  The Effect of Counterfactual Information on Outcome Value Coding in Medial Prefrontal and Cingulate Cortex: From an Absolute to a Relative Neural Code , 2020, The Journal of Neuroscience.

[18]  Agnieszka Tymula,et al.  Divisive Normalisation of Value Explains Choice-Reversals in Decision-Making Under Risk , 2019, SSRN Electronic Journal.

[19]  A. Genovesio,et al.  Effects of reward size and context on learning in macaque monkeys , 2019, Behavioural Brain Research.

[20]  J. Daunizeau,et al.  Assessing inter-individual differences with task-related functional neuroimaging , 2019, Nature Human Behaviour.

[21]  Paul W. Glimcher,et al.  The Normalization of Consumer Valuations: Context-Dependent Preferences from Neurobiological Constraints , 2019, Manag. Sci..

[22]  Katherine E. Conen,et al.  Partial Adaptation to the Value Range in the Macaque Orbitofrontal Cortex , 2019, The Journal of Neuroscience.

[23]  Robert C. Wilson,et al.  Ten simple rules for the computational modeling of behavioral data , 2019, eLife.

[24]  J. Rieskamp,et al.  How Similarity Between Choice Options Affects Decisions From Experience: The Accentuation-of-Differences Model , 2019, Psychological review.

[25]  Mehdi Khamassi,et al.  Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences , 2018, Nature Communications.

[26]  Marcia L. Spetch,et al.  Living Near the Edge: How Extreme Outcomes and Their Neighbors Drive Risky Choice , 2018, Journal of experimental psychology. General.

[27]  P. Glimcher,et al.  Free choice shapes normalized value signals in medial orbitofrontal cortex , 2018, Nature Communications.

[28]  Aldo Rustichini,et al.  Optimal coding and neuronal adaptation in economic decisions , 2017, Nature Communications.

[29]  Markus Ullsperger,et al.  Learning relative values in the striatum induces violations of normative decision making , 2017, Nature Communications.

[30]  E. Koechlin,et al.  The Importance of Falsification in Computational Cognitive Modeling , 2017, Trends in Cognitive Sciences.

[31]  N. Daw,et al.  Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework , 2017, Annual review of psychology.

[32]  W. Schultz,et al.  Partial Adaptation of Obtained and Observed Value Signals Preserves Information about Gains and Losses , 2016, The Journal of Neuroscience.

[33]  M. Delgado,et al.  The good, the bad and the brain: neural correlates of appetitive and aversive values underlying decision making , 2015, Current Opinion in Behavioral Sciences.

[34]  Ryan Webb,et al.  Adaptive neural coding: from biological to behavioral decision-making , 2015, Current Opinion in Behavioral Sciences.

[35]  M. Khamassi,et al.  Contextual modulation of value signals in reward and punishment learning , 2015, Nature Communications.

[36]  J. Kable,et al.  BOLD Subjective Value Signals Exhibit Robust Range Adaptation , 2014, The Journal of Neuroscience.

[37]  C. Padoa-Schioppa,et al.  Rational Attention and Adaptive Coding: A Puzzle and a Solution. , 2014, The American economic review.

[38]  Krzysztof Kontek,et al.  Range-Dependent Utility , 2013, Manag. Sci..

[39]  Joseph W. Kable,et al.  The valuation system: A coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value , 2013, NeuroImage.

[40]  Mel W. Khaw,et al.  Normalization is a general neural mechanism for context-dependent decision making , 2013, Proceedings of the National Academy of Sciences.

[41]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[42]  P. Glimcher,et al.  Efficient coding and the neural representation of value , 2012, Annals of the New York Academy of Sciences.

[43]  M. Carandini,et al.  Normalization as a canonical neural computation , 2011, Nature Reviews Neuroscience.

[44]  Gordon D. A. Brown,et al.  Does the brain calculate value? , 2011, Trends in Cognitive Sciences.

[45]  P. Glimcher,et al.  Reward Value-Based Gain Control: Divisive Normalization in Parietal Cortex , 2011, The Journal of Neuroscience.

[46]  N. Daw,et al.  Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction , 2011, The Journal of Neuroscience.

[47]  A. Rangel,et al.  Visual fixations and the computation and comparison of value in simple choice , 2010, Nature Neuroscience.

[48]  P. Wakker Prospect Theory: For Risk and Ambiguity , 2010 .

[49]  W. Schultz,et al.  Adaptation of Reward Sensitivity in Orbitofrontal Neurons , 2010, The Journal of Neuroscience.

[50]  Alex Kacelnik,et al.  Context-dependent utility overrides absolute memory as a determinant of choice , 2009, Proceedings of the National Academy of Sciences.

[51]  R. Hertwig,et al.  The description–experience gap in risky choice , 2009, Trends in Cognitive Sciences.

[52]  C. Padoa-Schioppa Range-Adapting Representation of Economic Value in the Orbitofrontal Cortex , 2009, The Journal of Neuroscience.

[53]  N. Chater,et al.  The Price of Pain and the Value of Suffering , 2009, Psychological science.

[54]  D. Heeger,et al.  The Normalization Model of Attention , 2009, Neuron.

[55]  Alex Kacelnik,et al.  State-Dependent Learned Valuation Drives Choice in an Invertebrate , 2006, Science.

[56]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[57]  M. Mizunami,et al.  Context-dependent olfactory learning in an insect. , 2004, Learning & memory.

[58]  Adrienne L. Fairhall,et al.  Efficiency and ambiguity in an adaptive neural code , 2001, Nature.

[59]  A. Tversky,et al.  Choices, Values, and Frames , 2000 .

[60]  P. Slovic The Construction of Preference , 1995 .

[61]  Ellen R. Girden,et al.  ANOVA: Repeated Measures , 1995 .

[62]  Christopher P. Puto,et al.  Adding Asymmetrically Dominated Alternatives: Violations of Regularity & the Similarity Hypothesis. , 1981 .

[63]  Marius Usher,et al.  Disentangling decision models: from independence to competition. , 2013, Psychological review.

[64]  Pete C. Trimmer,et al.  The ecological rationality of state-dependent valuation. , 2012, Psychological review.

[65]  P. Dayan,et al.  NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript NIH Public Access Author Manuscript Neuron. Author manuscript. , 2011 .

[66]  C. Lebiere,et al.  Instance-Based Cognitive Models of Decision-Making , 2005 .

[67]  H Pashler,et al.  How persuasive is a good fit? A comment on theory testing. , 2000, Psychological review.

[68]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[69]  Allen Parducci,et al.  Happiness, Pleasure, and Judgment: The Contextual Theory and Its Applications , 1995 .

[70]  D. Bernoulli Specimen theoriae novae de mensura sortis : translated into German and English , 1967 .

[71]  Allen Parducci,et al.  Range-frequency compromise in judgment. , 1963 .