论文信息 - The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive - 字舞流文

The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive

A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior—and under what circumstances—are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people’s choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.

A. Markman | N. Daw | A. R. Otto | S. Gershman | Ross Otto | Ross A. Otto | Samuel J. Gershman | Nathaniel D. Daw

[1] R. Davidson,et al. Consciousness and Self-Regulation: Advances in Research and Theory IV , 1976 .

[2] G. Schwartz,et al. Consciousness and Self-Regulation , 1976 .

[3] D. Norman,et al. Attention to action: Willed and automatic control , 1980 .

[4] D. Norman,et al. Attention to Action: Willed and Automatic Control of Behavior Technical Report No. 8006. , 1980 .

[5] A. Dickinson. Actions and habits: the development of behavioural autonomy , 1985 .

[6] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[7] John W. Payne,et al. The adaptive decision maker: Name index , 1993 .

[8] Eric J. Johnson,et al. The adaptive decision maker , 1993 .

[9] P. Dayan,et al. A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[10] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.

[11] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[12] J. Metcalfe,et al. A hot/cool-system analysis of delay of gratification: dynamics of willpower. , 1999, Psychological review.

[13] M. J. Emerson,et al. The Unity and Diversity of Executive Functions and Their Contributions to Complex “Frontal Lobe” Tasks: A Latent Variable Analysis , 2000, Cognitive Psychology.

[14] F. Ashby,et al. The effects of concurrent task interference on category learning: Evidence for multiple category learning systems , 2001, Psychonomic bulletin & review.

[15] V. Carey,et al. Mixed-Effects Models in S and S-Plus , 2001 .

[16] D. Kahneman,et al. Representativeness revisited: Attribute substitution in intuitive judgment. , 2002 .

[17] D. Kahneman,et al. Heuristics and Biases: The Psychology of Intuitive Judgment , 2002 .

[18] B. Balleine,et al. The Role of Learning in the Operation of Motivational Systems , 2002 .

[19] Andrew R. A. Conway,et al. Working memory capacity and its relation to general intelligence , 2003, Trends in Cognitive Sciences.

[20] Karl J. Friston,et al. Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[21] G. Loewenstein,et al. Animal Spirits: Affective and Deliberative Processes in Economic Behavior , 2004 .

[22] Samuel M. McClure,et al. Separate Neural Systems Value Immediate and Delayed Monetary Rewards , 2004, Science.

[23] T. Robbins,et al. Neural systems of reinforcement for drug addiction: from actions to habits to compulsion , 2005, Nature Neuroscience.

[24] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[25] W. T. Maddox,et al. Dual-task interference in perceptual category learning , 2006, Memory & cognition.

[26] H. Yin,et al. The role of the basal ganglia in habit formation , 2006, Nature Reviews Neuroscience.

[27] Russell A Poldrack,et al. Modulation of competing memory systems by distraction. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[28] Vivian V. Valentin,et al. Determining the Neural Substrates of Goal-Directed Learning in the Human Brain , 2007, The Journal of Neuroscience.

[29] Russell A Poldrack,et al. Secondary-task effects on classification learning , 2007, Memory & cognition.

[30] Peter Dayan,et al. Goal-directed control and its antipodes , 2009, Neural Networks.

[31] P. Dayan,et al. States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[32] Arthur B. Markman,et al. There are at least two kinds of probability matching: Evidence from a secondary task , 2011, Cognition.

[33] P. Dayan,et al. Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[34] Amir Dezfouli,et al. Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..