论文信息 - Environmental statistics and the trade-off between model-based and TD learning in humans - 字舞流文

Environmental statistics and the trade-off between model-based and TD learning in humans

There is much evidence that humans and other animals utilize a combination of model-based and model-free RL methods. Although it has been proposed that these systems may dominate according to their relative statistical efficiency in different circumstances, there is little specific evidence — especially in humans — as to the details of this trade-off. Accordingly, we examine the relative performance of different RL approaches under situations in which the statistics of reward are differentially noisy and volatile. Using theory and simulation, we show that model-free TD learning is relatively most disadvantaged in cases of high volatility and low noise. We present data from a decision-making experiment manipulating these parameters, showing that humans shift learning strategies in accord with these predictions. The statistical circumstances favoring model-based RL are also those that promote a high learning rate, which helps explain why, in psychology, the distinction between these strategies is traditionally conceived in terms of rule-based vs. incremental learning.

Nathaniel D. Daw | Dylan A. Simon | N. Daw | D. Simon

[1] M. Gluck,et al. Interactive memory systems in the human brain , 2001, Nature.

[2] Larry King,et al. Feedback and task predictability as determinants of performance in multiple cue probability learning tasks , 1976 .

[3] Berndt Brehmer,et al. Task information and performance in probabilistic inference tasks , 1978 .

[4] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .

[5] A. Bechara. Decision making, impulse control and loss of willpower to resist drugs: a neurocognitive perspective , 2005, Nature Neuroscience.

[6] P. Dayan,et al. Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[7] Peter Dayan,et al. Temporal difference models describe higher-order learning in humans , 2004, Nature.

[8] Amir Dezfouli,et al. Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[9] Kenji Doya,et al. Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics , 2006, Neural Networks.

[10] B. Balleine,et al. Multiple Forms of Value Learning and the Function of Dopamine , 2009 .

[11] Kenji Doya,et al. What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[12] A. David Redish,et al. Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model , 2005, Neural Networks.

[13] Peter Dayan,et al. Hippocampal Contributions to Control: The Third Way , 2007, NIPS.

[14] F. Toates. The interaction of cognitive and stimulus–response processes in the control of behaviour , 1997, Neuroscience & Biobehavioral Reviews.

[15] B. Balleine,et al. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[16] Karl J. Friston,et al. Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[17] Peter Dayan,et al. Goal-directed control and its antipodes , 2009, Neural Networks.

[18] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[19] P. Glimcher,et al. JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[20] W Todd Maddox,et al. Category number impacts rule-based but not information-integration category learning: further evidence for dissociable category-learning systems. , 2004, Journal of experimental psychology. Learning, memory, and cognition.

[21] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.

[22] M. Gluck,et al. Probabilistic classification learning in amnesia. , 1994, Learning & memory.

[23] W. T. Maddox,et al. Dissociating explicit and procedural-learning based systems of perceptual category learning , 2004, Behavioural Processes.