Imaginative Reinforcement Learning: Computational Principles and Neural Mechanisms

Imagination enables us not only to transcend reality but also to learn about it. In the context of reinforcement learning, an agent can rationally update its value estimates by simulating an internal model of the environment, provided that the model is accurate. In a series of sequential decision-making experiments, we investigated the impact of imaginative simulation on subsequent decisions. We found that imagination can cause people to pursue imagined paths, even when these paths are suboptimal. This bias is systematically related to participants' optimism about how much reward they expect to receive along imagined paths; providing feedback strongly attenuates the effect. The imagination effect can be captured by a reinforcement learning model that includes a bonus added onto imagined rewards. Using fMRI, we show that a network of regions associated with valuation is predictive of the imagination effect. These results suggest that imagination, although a powerful tool for learning, is also susceptible to motivational biases.

[1]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2]  T. Sharot The optimism bias , 2011, Current Biology.

[3]  Bianca C. Wittmann,et al.  Reward and Novelty Enhance Imagination of Future Events in a Motivational-Episodic Network , 2015, PloS one.

[4]  Tomaso Poggio,et al.  Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.

[5]  Margaret F. Carr,et al.  Hippocampal replay in the awake state: a potential physiological substrate of memory consolidation and retrieval , 2011 .

[6]  Wolfram Schultz,et al.  BOLD responses in reward regions to hypothetical and imaginary monetary rewards , 2012, NeuroImage.

[7]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[8]  Stephen M. Smith,et al.  Multiplexed Echo Planar Imaging for Sub-Second Whole Brain FMRI and Fast Diffusion Imaging , 2010, PloS one.

[9]  Richard S. Sutton,et al.  Associative Learning from Replayed Experience , 2017, bioRxiv.

[10]  Fiery Cushman,et al.  Simulating murder: the aversion to harmful action. , 2012, Emotion.

[11]  Thomas H. B. FitzGerald,et al.  Disruption of Dorsolateral Prefrontal Cortex Decreases Model-Based in Favor of Model-free Control in Humans , 2013, Neuron.

[12]  Karl J. Friston,et al.  Bayesian model selection for group studies — Revisited , 2014, NeuroImage.

[13]  Margaret F. Carr,et al.  Hippocampal replay in the awake state: a potential substrate for memory consolidation and retrieval , 2011, Nature Neuroscience.

[14]  A. Markman,et al.  The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive , 2013 .

[15]  Tamar Flash,et al.  Multiple shifts in the representation of a motor sequence during the acquisition of skilled performance , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  N. Tzourio-Mazoyer,et al.  Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain , 2002, NeuroImage.

[17]  R. Nathan Spreng,et al.  The Common Neural Basis of Autobiographical Memory, Prospection, Navigation, Theory of Mind, and the Default Mode: A Quantitative Meta-analysis , 2009, Journal of Cognitive Neuroscience.

[18]  Joshua Knobe,et al.  Thought experiments. , 2011, Scientific American.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  B. Balleine,et al.  Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[21]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[22]  Steen Moeller,et al.  Evaluation of slice accelerations using multiband echo planar imaging at 3T , 2013, NeuroImage.

[23]  Ken A Paller,et al.  Neural Evidence That Vivid Imagining Can Lead to False Remembering , 2004, Psychological science.

[24]  K. Paller,et al.  Upgrading the sleeping brain with targeted memory reactivation , 2013, Trends in Cognitive Sciences.

[25]  Daniel L. Schacter,et al.  Neural processes underlying memory attribution on a reality-monitoring task. , 2006, Cerebral cortex.

[26]  F. Cushman,et al.  Habitual control of goal selection in humans , 2015, Proceedings of the National Academy of Sciences.

[27]  Michael H. Herzog,et al.  Human Perceptual Learning by Mental Imagery , 2009, Current Biology.

[28]  Peter Dayan,et al.  Temporal difference models describe higher-order learning in humans , 2004, Nature.

[29]  Carolyn Copper,et al.  Does mental practice enhance performance , 1994 .

[30]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[31]  Wouter Kool,et al.  When Does Model-Based Control Pay Off? , 2016, PLoS Comput. Biol..

[32]  J. O'Doherty,et al.  Human Medial Orbitofrontal Cortex Is Recruited during Experience of Imagined and Real Rewards Prescan Training , 2022 .

[33]  Giovanni Pezzulo,et al.  The Mixed Instrumental Controller: Using Value of Information to Combine Habitual Choice and Mental Simulation , 2013, Front. Psychol..

[34]  P. Dayan,et al.  Adaptive integration of habits into depth-limited planning defines a habitual-goal–directed spectrum , 2016, Proceedings of the National Academy of Sciences.

[35]  R. Stickgold,et al.  Practice with Sleep Makes Perfect Sleep-Dependent Motor Skill Learning , 2002, Neuron.

[36]  M. Rohrbaugh,et al.  Paradoxical enhancement of learned fear. , 1970, Journal of abnormal psychology.

[37]  S. Kosslyn,et al.  Visual mental images can be ambiguous: insights from individual differences in spatial transformation abilities , 2002, Cognition.

[38]  R. Buckner The role of the hippocampus in prediction and imagination. , 2010, Annual review of psychology.

[39]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[40]  Samuel J. Gershman,et al.  Competition and Cooperation Between Multiple Reinforcement Learning Systems , 2018 .

[41]  J. Carroll The Effect of Imagining an Event on Expectations for the Event: An Interpretation in Terms of the Availability Heuristic. , 1978 .

[42]  R. Dolan,et al.  Dopamine Enhances Model-Based over Model-Free Choice Behavior , 2012, Neuron.

[43]  A. David Redish,et al.  Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model , 2005, Neural Networks.

[44]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[45]  A. Kind,et al.  Knowledge Through Imagination , 2016 .

[46]  A. Markman,et al.  Journal of Experimental Psychology : General Retrospective Revaluation in Sequential Decision Making : A Tale of Two Systems , 2012 .

[47]  Steen Moeller,et al.  Multiband multislice GE‐EPI at 7 tesla, with 16‐fold acceleration using partial parallel imaging with application to high spatial and temporal whole‐brain fMRI , 2010, Magnetic resonance in medicine.

[48]  Erica L. Wohldmann,et al.  Pushing the limits of imagination: mental practice for learning sequences. , 2007, Journal of Experimental Psychology. Learning, Memory and Cognition.