The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive

A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior—and under what circumstances—are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people’s choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.

[1]  R. Davidson,et al.  Consciousness and Self-Regulation: Advances in Research and Theory IV , 1976 .

[2]  G. Schwartz,et al.  Consciousness and Self-Regulation , 1976 .

[3]  D. Norman,et al.  Attention to action: Willed and automatic control , 1980 .

[4]  D. Norman,et al.  Attention to Action: Willed and Automatic Control of Behavior Technical Report No. 8006. , 1980 .

[5]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[6]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[7]  John W. Payne,et al.  The adaptive decision maker: Name index , 1993 .

[8]  Eric J. Johnson,et al.  The adaptive decision maker , 1993 .

[9]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[10]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[11]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[12]  J. Metcalfe,et al.  A hot/cool-system analysis of delay of gratification: dynamics of willpower. , 1999, Psychological review.

[13]  M. J. Emerson,et al.  The Unity and Diversity of Executive Functions and Their Contributions to Complex “Frontal Lobe” Tasks: A Latent Variable Analysis , 2000, Cognitive Psychology.

[14]  F. Ashby,et al.  The effects of concurrent task interference on category learning: Evidence for multiple category learning systems , 2001, Psychonomic bulletin & review.

[15]  V. Carey,et al.  Mixed-Effects Models in S and S-Plus , 2001 .

[16]  D. Kahneman,et al.  Representativeness revisited: Attribute substitution in intuitive judgment. , 2002 .

[17]  D. Kahneman,et al.  Heuristics and Biases: The Psychology of Intuitive Judgment , 2002 .

[18]  B. Balleine,et al.  The Role of Learning in the Operation of Motivational Systems , 2002 .

[19]  Andrew R. A. Conway,et al.  Working memory capacity and its relation to general intelligence , 2003, Trends in Cognitive Sciences.

[20]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[21]  G. Loewenstein,et al.  Animal Spirits: Affective and Deliberative Processes in Economic Behavior , 2004 .

[22]  Samuel M. McClure,et al.  Separate Neural Systems Value Immediate and Delayed Monetary Rewards , 2004, Science.

[23]  T. Robbins,et al.  Neural systems of reinforcement for drug addiction: from actions to habits to compulsion , 2005, Nature Neuroscience.

[24]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[25]  W. T. Maddox,et al.  Dual-task interference in perceptual category learning , 2006, Memory & cognition.

[26]  H. Yin,et al.  The role of the basal ganglia in habit formation , 2006, Nature Reviews Neuroscience.

[27]  Russell A Poldrack,et al.  Modulation of competing memory systems by distraction. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Vivian V. Valentin,et al.  Determining the Neural Substrates of Goal-Directed Learning in the Human Brain , 2007, The Journal of Neuroscience.

[29]  Russell A Poldrack,et al.  Secondary-task effects on classification learning , 2007, Memory & cognition.

[30]  Peter Dayan,et al.  Goal-directed control and its antipodes , 2009, Neural Networks.

[31]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[32]  Arthur B. Markman,et al.  There are at least two kinds of probability matching: Evidence from a secondary task , 2011, Cognition.

[33]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[34]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..