A note on the analysis of two-stage task results: How changes in task structure affect what model-free and model-based strategies predict about the effects of reward and transition on the stay probability

Many studies that aim to detect model-free and model-based influences on behavior employ two-stage behavioral tasks of the type pioneered by Daw and colleagues in 2011. Such studies commonly modify existing two-stage decision paradigms in order to better address a given hypothesis, which is an important means of scientific progress. It is, however, critical to fully appreciate the impact of any modified or novel experimental design features on the expected results. Here, we use two concrete examples to demonstrate that relatively small changes in the two-stage task design can substantially change the pattern of actions taken by model-free and model-based agents as a function of the reward outcomes and transitions on previous trials. In the first, we show that, under specific conditions, purely model-free agents will produce the reward by transition interactions typically thought to characterize model-based behavior on a two-stage task. The second example shows that model-based agents’ behavior is driven by a main effect of transition-type in addition to the canonical reward by transition interaction whenever the reward probabilities of the final states do not sum to one. Together, these examples emphasize the task-dependence of model-free and model-based behavior and highlight the benefits of using computer simulations to determine what pattern of results to expect from both model-free and model-based agents performing a given two-stage decision task in order to design choice paradigms and analysis strategies best suited to the current question.

[1]  P. Dayan,et al.  Disorders of compulsivity: a common bias towards learning habits , 2014, Molecular Psychiatry.

[2]  R. Dolan,et al.  Dopamine Enhances Model-Based over Model-Free Choice Behavior , 2012, Neuron.

[3]  N. Daw,et al.  Model-based learning protects against forming habits , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[4]  Thomas H. B. FitzGerald,et al.  Disruption of Dorsolateral Prefrontal Cortex Decreases Model-Based in Favor of Model-free Control in Humans , 2013, Neuron.

[5]  Dylan A. Simon,et al.  Model-based choices involve prospective neural activity , 2015, Nature Neuroscience.

[6]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[7]  A. Markman,et al.  The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive , 2013 .

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Bernard W. Balleine,et al.  Actions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized , 2013, PLoS Comput. Biol..

[10]  Nathaniel D. Daw,et al.  Cognitive Control Predicts Use of Model-based Reinforcement Learning , 2014, Journal of Cognitive Neuroscience.

[11]  Wouter Kool,et al.  When Does Model-Based Control Pay Off? , 2016, PLoS Comput. Biol..

[12]  F. Cushman,et al.  Habitual control of goal selection in humans , 2015, Proceedings of the National Academy of Sciences.

[13]  R. Dolan,et al.  Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making , 2015, Proceedings of the National Academy of Sciences.

[14]  Valerie Voon,et al.  Distinct cortico-striatal connections with subthalamic nucleus underlie facets of compulsivity , 2017, Cortex.

[15]  Alice Y. Chiang,et al.  Working-memory capacity protects model-based learning from stress , 2013, Proceedings of the National Academy of Sciences.

[16]  Shu-Chen Li,et al.  Of goals and habits: age-related and individual differences in goal-directed decision-making , 2013, Front. Neurosci..

[17]  L. Deserno,et al.  Model-Based and Model-Free Decisions in Alcohol Dependence , 2014, Neuropsychobiology.