Journal of Experimental Psychology : General Retrospective Revaluation in Sequential Decision Making : A Tale of Two Systems

Recent computational theories of decision making in humans and animals have portrayed 2 systems locked in a battle for control of behavior. One system--variously termed model-free or habitual--favors actions that have previously led to reward, whereas a second--called the model-based or goal-directed system--favors actions that causally lead to reward according to the agent's internal model of the environment. Some evidence suggests that control can be shifted between these systems using neural or behavioral manipulations, but other evidence suggests that the systems are more intertwined than a competitive account would imply. In 4 behavioral experiments, using a retrospective revaluation design and a cognitive load manipulation, we show that human decisions are more consistent with a cooperative architecture in which the model-free system controls behavior, whereas the model-based system trains the model-free system by replaying and simulating experience.

[1]  E. Thorndike “Animal Intelligence” , 1898, Nature.

[2]  L. M. M.-T. Theory of Probability , 1929, Nature.

[3]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[4]  R. Bellman Dynamic programming. , 1957, Science.

[5]  R. Rescorla A theory of pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement , 1972 .

[6]  Christopher D. Adams Variations in the Sensitivity of Instrumental Responding to Reinforcer Devaluation , 1982 .

[7]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[8]  D. Shanks Forward and Backward Blocking in Human Contingency Judgement , 1985 .

[9]  C. Watkins Learning from delayed rewards , 1989 .

[10]  A. Markman LMS rules and the inverse base-rate effect: Comment on Gluck and Bower (1988). , 1989 .

[11]  R. Engle,et al.  Is working memory capacity task dependent , 1989 .

[12]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[13]  G. Chapman Trial order affects cue interaction in contingency judgment. , 1991, Journal of experimental psychology. Learning, memory, and cognition.

[14]  M. Gluck,et al.  Hippocampal mediation of stimulus representation: A computational theory , 1993, Hippocampus.

[15]  E. Wasserman,et al.  Cue Competition in Causality Judgments: The Role of Nonpresentation of Compound Stimulus Elements , 1994 .

[16]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[17]  A. Dickinson,et al.  Within Compound Associations Mediate the Retrospective Revaluation of Causality Judgements , 1996, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[18]  A. Dickinson,et al.  Retrospective revaluation of causal judgments under positive and negative contingencies. , 1998 .

[19]  D. Wilkin,et al.  Neuron , 2001, Brain Research.

[20]  R. Engle Working Memory Capacity as Executive Attention , 2002 .

[21]  B. Balleine,et al.  The Role of Learning in the Operation of Motivational Systems , 2002 .

[22]  Andrew R. A. Conway,et al.  Working memory capacity and its relation to general intelligence , 2003, Trends in Cognitive Sciences.

[23]  H. Mallot,et al.  Reward modulates neuronal activity in the hippocampus of the rat , 2003, Behavioural Brain Research.

[24]  Klaus G. Melchers,et al.  Within-compound associations in retrospective revaluation and in direct learning: a challenge for comparator theory. , 2004, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[25]  Richard P. Heitz,et al.  An automated version of the operation span task , 2005, Behavior research methods.

[26]  William B. Levy,et al.  Interpreting hippocampal function as recoding and forecasting , 2005, Neural Networks.

[27]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[28]  Raymond J. Dolan,et al.  Information theory, novelty and hippocampal responses: unpredicted or unpredictable? , 2005, Neural Networks.

[29]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30]  A. David Redish,et al.  Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model , 2005, Neural Networks.

[31]  W. T. Maddox,et al.  Dual-task interference in perceptual category learning , 2006, Memory & cognition.

[32]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[33]  Karl J. Friston,et al.  Encoding uncertainty in the hippocampus , 2006, Neural Networks.

[34]  Russell A Poldrack,et al.  Modulation of competing memory systems by distraction. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[35]  G. Buzsáki,et al.  Forward and reverse hippocampal place-cell sequences during ripples , 2007, Nature Neuroscience.

[36]  Vivian V. Valentin,et al.  Determining the Neural Substrates of Goal-Directed Learning in the Human Brain , 2007, The Journal of Neuroscience.

[37]  Matthijs A. A. van der Meer,et al.  Integrating hippocampus and striatum in decision-making , 2007, Current Opinion in Neurobiology.

[38]  Adam Johnson,et al.  Neural Ensembles in CA3 Transiently Encode Paths Forward of the Animal at a Decision Point , 2007, The Journal of Neuroscience.

[39]  David S. Touretzky,et al.  Context Learning in the Rodent Hippocampus , 2007, Neural Computation.

[40]  John R. Anderson,et al.  Dual learning processes in interactive skill acquisition. , 2008, Journal of experimental psychology. Applied.

[41]  D. Shohamy,et al.  Integrating Memories in the Human Brain: Hippocampal-Midbrain Encoding of Overlapping Events , 2008, Neuron.

[42]  M. D’Esposito Working memory. , 2008, Handbook of clinical neurology.

[43]  B. Balleine,et al.  A specific role for posterior dorsolateral striatum in human habit learning , 2009, The European journal of neuroscience.

[44]  B. McNaughton,et al.  Hippocampus Leads Ventral Striatum in Replay of Place-Reward Information , 2009, PLoS biology.

[45]  Y. Niv Reinforcement learning in the brain , 2009 .

[46]  Peter Dayan,et al.  Goal-directed control and its antipodes , 2009, Neural Networks.

[47]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[48]  D. Blei,et al.  Context, learning, and extinction. , 2010, Psychological review.

[49]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[50]  Matthijs A. A. van der Meer,et al.  Hippocampal Replay Is Not a Simple Function of Experience , 2010, Neuron.

[51]  R. Buckner The role of the hippocampus in prediction and imagination. , 2010, Annual review of psychology.

[52]  B. Balleine,et al.  Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.

[53]  Arthur B. Markman,et al.  There are at least two kinds of probability matching: Evidence from a secondary task , 2011, Cognition.

[54]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[55]  Nathaniel D. Daw,et al.  Environmental statistics and the trade-off between model-based and TD learning in humans , 2011, NIPS.

[56]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[57]  G. Dragoi,et al.  Preplay of future place cell sequences by hippocampal cellular assemblies , 2011, Nature.

[58]  N. Daw,et al.  Dissociating hippocampal and striatal contributions to sequential prediction learning , 2012, The European journal of neuroscience.

[59]  R. J. McDonald,et al.  A triple dissociation of memory systems: Hippocampus, amygdala, and dorsal striatum. , 1993, Behavioral neuroscience.

[60]  A. Markman,et al.  The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive , 2013 .