A common model explaining flexible decision making, grid fields and cognitive control

A central difficulty for computational theories of planning is that the value of an action taken now depends on which actions are chosen afterward. Thus, optimal choices are coupled across states. We argue that this interdependence underlies a pattern of challenges for reinforcement learning models to explain both the brain9s flexibilities and inflexibilities. Building on advances in control engineering, we propose a model for decision-making in the brain that is more efficient, flexible and biologically realistic than previous attempts. It replaces the classic iterative optimization with a linear approximation that addresses interdependence by softly maximizing around a default policy. This solution exposes connections between seemingly disparate phenomena across neuroscience, notably flexible replanning with biases and cognitive control. It also gives new insight into how the brain can represent maps of long-distance contingencies stably and componentially, as in entorhinal response fields, and exploit them to guide choice even under changing goals.

[1]  B. Balleine,et al.  Motivational control of goal-directed action , 1994 .

[2]  Angela L. Duckworth,et al.  An opportunity cost model of subjective effort and task performance. , 2013, The Behavioral and brain sciences.

[3]  P. Dayan,et al.  Adaptive integration of habits into depth-limited planning defines a habitual-goal–directed spectrum , 2016, Proceedings of the National Academy of Sciences.

[4]  H. Kappen Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[5]  A. Dickinson,et al.  Stimulus-outcome interactions during instrumental discrimination learning by rats and humans. , 2007, Journal of experimental psychology. Animal behavior processes.

[6]  Emanuel Todorov,et al.  General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.

[7]  M. Moser,et al.  Representation of Geometric Borders in the Entorhinal Cortex , 2008, Science.

[8]  O. Hikosaka Models of information processing in the basal Ganglia edited by James C. Houk, Joel L. Davis and David G. Beiser, The MIT Press, 1995. $60.00 (400 pp) ISBN 0 262 08234 9 , 1995, Trends in Neurosciences.

[9]  Samuel J Gershman,et al.  The Successor Representation: Its Computational Logic and Neural Substrates , 2018, The Journal of Neuroscience.

[10]  Lisa M. Giocomo,et al.  Remembered reward locations restructure entorhinal spatial maps , 2019, Science.

[11]  M. Botvinick,et al.  Planning as inference , 2012, Trends in Cognitive Sciences.

[12]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[13]  Ron Meir,et al.  Extracting grid cell characteristics from place cell inputs using non-negative principal component analysis , 2016, eLife.

[14]  P. Glimcher,et al.  Reward Value-Based Gain Control: Divisive Normalization in Parietal Cortex , 2011, The Journal of Neuroscience.

[15]  Jonathan D. Cohen,et al.  The Computational and Neural Basis of Cognitive Control: Charted Territory and New Frontiers , 2014, Cogn. Sci..

[16]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[17]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[18]  Soo-Young Lee,et al.  An Optimization Network for Matrix Inversion , 1987, NIPS.

[19]  N. Daw,et al.  Anxiety, avoidance, and sequential evaluation , 2019, bioRxiv.

[20]  Marcelo G Mattar,et al.  Prioritized memory access explains planning and hippocampal replay , 2017, Nature Neuroscience.

[21]  Q. Huys,et al.  A Formal Valuation Framework for Emotions and Their Control , 2017, Biological Psychiatry.

[22]  Sridhar Mahadevan,et al.  Representation Policy Iteration , 2005, UAI.

[23]  P. Balsam,et al.  Behavioral Neuroscience of Motivation , 2016, Current Topics in Behavioral Neurosciences.

[24]  Samuel Gershman,et al.  Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017, bioRxiv.

[25]  Timothy E. J. Behrens,et al.  Organizing conceptual knowledge in humans with a gridlike code , 2016, Science.

[26]  K. Jeffery,et al.  Grid Cells Form a Global Representation of Connected Environments , 2015, Current Biology.

[27]  Jozsef Csicsvari,et al.  The entorhinal cognitive map is attracted to goals , 2019, Science.

[28]  Peter Dayan,et al.  Interplay of approximate planning strategies , 2015, Proceedings of the National Academy of Sciences.

[29]  C. N. Boehler,et al.  The influence of reward associations on conflict processing in the Stroop task , 2010, Cognition.

[30]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[31]  Stefanie Tellex,et al.  Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning , 2017, ArXiv.

[32]  Joseph T. McGuire,et al.  Decision making and the avoidance of cognitive demand. , 2010, Journal of experimental psychology. General.

[33]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[34]  P. Dayan,et al.  The algorithmic anatomy of model-based evaluation , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[35]  H. Gleitman,et al.  Studies in learning and motivation; equal reinforcements in both end-boxes; followed by shock in one end-box. , 1949, Journal of experimental psychology.

[36]  Nathaniel D. Daw,et al.  Grid Cells, Place Cells, and Geodesic Generalization for Spatial Reinforcement Learning , 2011, PLoS Comput. Biol..

[37]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[38]  Timothy E.J. Behrens,et al.  Intuitive planning: global navigation through cognitive maps based on grid-like codes , 2018 .

[39]  Zeb Kurth-Nelson,et al.  What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior , 2018, Neuron.

[40]  Jonathan D. Cohen,et al.  Toward a Rational and Mechanistic Account of Mental Effort. , 2017, Annual review of neuroscience.

[41]  B. Balleine,et al.  Learning and Motivational Processes Contributing to Pavlovian-Instrumental Transfer and Their Neural Bases: Dopamine and Beyond. , 2016, Current topics in behavioral neurosciences.

[42]  Qiliang He,et al.  Environmental Barriers Disrupt Grid-like Representations in Humans during Navigation , 2019, Current Biology.

[43]  F. Cushman,et al.  Habitual control of goal selection in humans , 2015, Proceedings of the National Academy of Sciences.

[44]  D. Shohamy,et al.  Preference by Association: How Memory Mechanisms in the Hippocampus Bias Decisions , 2012, Science.

[45]  Jonathan R. Whitlock,et al.  Fragmentation of grid cell maps in a multicompartment environment , 2009, Nature Neuroscience.

[46]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[47]  T. Robbins,et al.  Drug Addiction: Updating Actions to Habits to Compulsions Ten Years On. , 2016, Annual review of psychology.

[48]  T. Hafting,et al.  Microstructure of a spatial map in the entorhinal cortex , 2005, Nature.

[49]  N. Daw,et al.  Characterizing a psychiatric symptom dimension related to deficits in goal-directed control , 2016, eLife.

[50]  M. Botvinick,et al.  Motivation and cognitive control: from behavior to neural mechanism. , 2015, Annual review of psychology.

[51]  Michael Woodford,et al.  Prospect Theory as Efficient Perceptual Distortion , 2012 .

[52]  Michael Brecht,et al.  Home, head direction stability and grid cell distortion , 2019, bioRxiv.

[53]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[54]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[55]  T. Braver,et al.  What Is the Subjective Cost of Cognitive Effort? Load, Trait, and Aging Effects Revealed by Economic Preference , 2013, PloS one.

[56]  James L. McClelland,et al.  On the control of automatic processes: a parallel distributed processing account of the Stroop effect. , 1990, Psychological review.

[57]  Bernard W Balleine,et al.  General and outcome‐specific forms of Pavlovian‐instrumental transfer: the effect of shifts in motivational state and inactivation of the ventral tegmental area , 2007, The European journal of neuroscience.

[58]  Kevin J. Miller,et al.  Habits without Values , 2016, bioRxiv.

[59]  M. Botvinick,et al.  The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.

[60]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[61]  M. Botvinick,et al.  The hippocampus as a predictive map , 2016 .

[62]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[63]  B. Balleine,et al.  Habits, action sequences and reinforcement learning , 2012, The European journal of neuroscience.

[64]  Samuel Gershman,et al.  The Neural Costs of Optimal Control , 2010, NIPS.

[65]  Caswell Barry,et al.  The Tolman-Eichenbaum Machine: Unifying Space and Relational Memory through Generalization in the Hippocampal Formation , 2019, Cell.

[66]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.