Mouse tracking reveals structure knowledge in the absence of model-based choice

Converging evidence has demonstrated that humans exhibit two distinct strategies when learning in complex environments. One is model-free learning, i.e., simple reinforcement of rewarded actions, and the other is model-based learning, which considers the structure of the environment. Recent work has argued that people exhibit little model-based behavior unless it leads to higher rewards. Here we use mouse tracking to study model-based learning in stochastic and deterministic (pattern-based) environments of varying difficulty. In both tasks participants’ mouse movements reveal that they learned the structures of their environments, despite the fact that standard behavior-based estimates suggested no such learning in the stochastic task. Thus, we argue that mouse tracking can reveal whether subjects have structure knowledge, which is necessary but not sufficient for model-based choice. Mouse tracking can reveal people’s subjective beliefs and whether they understand the structure of a task. These data demonstrate that people often do not use this information to make good choices.

[1]  Todd A Hare,et al.  A note on the analysis of two-stage task results: How changes in task structure affect what model-free and model-based strategies predict about the effects of reward and transition on the stay probability , 2017, bioRxiv.

[2]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[3]  Y. Niv,et al.  Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based, But Not Model-Free, Reinforcement Learning , 2011, The Journal of Neuroscience.

[4]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[5]  Shinsuke Shimojo,et al.  Neural Computations Underlying Arbitration between Model-Based and Model-free Learning , 2013, Neuron.

[6]  Joseph G. Johnson,et al.  Applying the decision moving window to risky choice: Comparison of eye-tracking and mouse-tracing methods , 2011, Judgment and Decision Making.

[7]  Nathaniel D. Daw,et al.  Cognitive Control Predicts Use of Model-based Reinforcement Learning , 2014, Journal of Cognitive Neuroscience.

[8]  Wouter Kool,et al.  When Does Model-Based Control Pay Off? , 2016, PLoS Comput. Biol..

[9]  N. Daw,et al.  Model-based learning protects against forming habits , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[10]  R. Dolan,et al.  Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making , 2015, Proceedings of the National Academy of Sciences.

[11]  Wouter Kool,et al.  Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems , 2017, Psychological science.

[12]  Peter Dayan,et al.  Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task , 2015, bioRxiv.

[13]  A. Markman,et al.  The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive , 2013 .

[14]  J. Henderson Gaze Control as Prediction , 2017, Trends in Cognitive Sciences.

[15]  L. Deserno,et al.  Model-Based and Model-Free Decisions in Alcohol Dependence , 2014, Neuropsychobiology.

[16]  N. Daw,et al.  Multiple memory systems as substrates for multiple decision systems , 2015, Neurobiology of Learning and Memory.

[17]  Mkael Symmonds,et al.  Hedging Your Bets by Learning Reward Correlations in the Human Brain , 2011, Neuron.

[18]  M. Bastin,et al.  Beyond Reaction Times: Incorporating Mouse-Tracking Measures into the Implicit Association Test to Examine its Underlying Process , 2012 .

[19]  Paul E. Stillman,et al.  Minding One's Reach (To Eat): The Promise of Computer Mouse-Tracking to Study Self-Regulation of Eating , 2018, Front. Nutr..

[20]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[21]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[22]  Nathaniel D. Daw,et al.  Cortical and Hippocampal Correlates of Deliberation during Model-Based Decisions for Rewards in Humans , 2013, PLoS Comput. Biol..

[23]  Paul E. Stillman,et al.  How Mouse-tracking Can Advance Social Cognitive Theory , 2018, Trends in Cognitive Sciences.

[24]  Dylan A. Simon,et al.  Model-based choices involve prospective neural activity , 2015, Nature Neuroscience.

[25]  Catherine A. Hartley,et al.  From Creatures of Habit to Goal-Directed Learners , 2016, Psychological science.

[26]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[27]  Arkady Konovalov,et al.  Neurocomputational Dynamics of Sequence Learning , 2018, Neuron.

[28]  Joseph G. Johnson,et al.  Response dynamics: A new window on the decision process , 2011, Judgment and Decision Making.

[29]  N. Daw,et al.  Extraversion differentiates between model-based and model-free strategies in a reinforcement learning task , 2013, Front. Hum. Neurosci..

[30]  N. Sebanz,et al.  Do people automatically track others’ beliefs? Evidence from a continuous measure , 2014, Cognition.

[31]  N. Daw Are we of two minds? , 2018, Nature Neuroscience.

[32]  Arkady Konovalov,et al.  Gaze data reveal distinct choice processes underlying model-based and model-free reinforcement learning , 2016, Nature Communications.

[33]  U. Fischbacher,et al.  Response time and click position: cheap indicators of preferences , 2016 .

[34]  R. Dolan,et al.  Dopamine Enhances Model-Based over Model-Free Choice Behavior , 2012, Neuron.

[35]  P. Dayan,et al.  The algorithmic anatomy of model-based evaluation , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[36]  Ulrik R. Beierholm,et al.  Separate encoding of model-based and model-free valuations in the human brain , 2011, NeuroImage.

[37]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[38]  Shu-Chen Li,et al.  Of goals and habits: age-related and individual differences in goal-directed decision-making , 2013, Front. Neurosci..

[39]  A. Markman,et al.  Journal of Experimental Psychology : General Retrospective Revaluation in Sequential Decision Making : A Tale of Two Systems , 2012 .

[40]  Bernard W. Balleine,et al.  Actions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized , 2013, PLoS Comput. Biol..

[41]  Cendri A. C. Hutcherson,et al.  Dietary Self-Control Is Related to the Speed With Which Attributes of Healthfulness and Tastiness Are Processed , 2015, Psychological science.

[42]  Jonathan B. Freeman,et al.  Doing Psychological Science by Hand , 2018, Current directions in psychological science.

[43]  Joshua B. Miller,et al.  Surprised by the Gambler's and Hot Hand Fallacies? A Truth in the Law of Small Numbers , 2016, 1902.01265.