Origin of perseveration in the trade-off between reward and complexity

When humans and other animals make repeated choices, they tend to repeat previously chosen actions independently of their reward history. This paper locates the origin of perseveration in a trade-off between two computational goals: maximizing rewards and minimizing the complexity of the action policy. We develop an information-theoretic formalization of policy complexity and show how optimizing the trade-off leads to perseveration. Analysis of two data sets reveals that people attain close to optimal trade-offs. Parameter estimation and model comparison supports the claim that perseveration quantitatively agrees with the theoretically predicted functional form.

[1]  Karl J. Friston,et al.  Bayesian model selection for group studies — Revisited , 2014, NeuroImage.

[2]  P. Bertelson,et al.  Serial Choice Reaction-time as a Function of Response versus Signal-and-Response Repetition , 1965, Nature.

[3]  Tobias H. Donner,et al.  Adaptive History Biases Result from Confidence-Weighted Accumulation of past Choices , 2017, The Journal of Neuroscience.

[4]  Samuel J. Gershman,et al.  Competition and Cooperation Between Multiple Reinforcement Learning Systems , 2018 .

[5]  Daniel Polani,et al.  Information Theory of Decisions and Actions , 2011 .

[7]  Marcus Hutter,et al.  Distribution of Mutual Information , 2001, NIPS.

[8]  S. Gershman Empirical priors for reinforcement learning models , 2016 .

[9]  P. Dayan,et al.  Serotonin Selectively Modulates Reward Value in Human Decision-Making , 2012, The Journal of Neuroscience.

[10]  Bradley C. Love,et al.  Coherency-maximizing exploration in the supermarket , 2017, Nature Human Behaviour.

[11]  F. Mathy,et al.  What’s magic about magic numbers? Chunking and data compression in short-term memory , 2012, Cognition.

[12]  Alice Y. Chiang,et al.  Working-memory capacity protects model-based learning from stress , 2013, Proceedings of the National Academy of Sciences.

[13]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[14]  Chris R Sims,et al.  Efficient coding explains the universal law of generalization in human perception , 2018, Science.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Chien-Chung Chen,et al.  A comparison of pedestal effects in first- and second-order patterns. , 2014, Journal of vision.

[17]  Wouter Kool,et al.  Planning Complexity Registers as a Cost in Metacontrol , 2018, Journal of Cognitive Neuroscience.

[18]  Wendy Wood,et al.  Psychology of Habit. , 2016, Annual review of psychology.

[19]  N. Daw Are we of two minds? , 2018, Nature Neuroscience.

[20]  Suguru Arimoto,et al.  An algorithm for computing the capacity of arbitrary discrete memoryless channels , 1972, IEEE Trans. Inf. Theory.

[21]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[22]  S. M. Vyas,et al.  Interference between binary classification judgments and some repetition effects in a serial choice reaction time task , 1974 .

[23]  Charles Kemp,et al.  Efficient compression in color naming and its evolution , 2018, Proceedings of the National Academy of Sciences.

[24]  Doina Precup,et al.  An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[25]  Justin L. Gardner,et al.  Adaptable history biases in human perceptual decisions , 2016, Proceedings of the National Academy of Sciences.

[26]  Temple F. Smith Occam's razor , 1980, Nature.

[27]  Timothy F. Brady,et al.  Compression in visual working memory: using statistical regularities to form more efficient memory representations. , 2009, Journal of experimental psychology. General.

[28]  M. May,et al.  Competition and cooperation , 1937 .

[29]  Mauro Barni,et al.  An Information Theoretic Perspective , 2004 .

[30]  G. Pezzulo,et al.  An information-theoretic perspective on the costs of cognition , 2018, Neuropsychologia.

[31]  C. Sims Rate–distortion theory and human perception , 2016, Cognition.

[32]  Daniel M. Wolpert,et al.  Efficient state-space modularization for planning: theory, behavioral and neural signatures , 2016, NIPS.

[33]  Samuel J. Gershman,et al.  The algorithmic architecture of exploration in the human brain , 2019, Current Opinion in Neurobiology.

[34]  Chris R. Sims,et al.  Policy Generalization In Capacity-Limited Reinforcement Learning , 2018 .

[35]  Mark Steyvers,et al.  A large-scale analysis of task switching practice effects across the lifespan , 2019, Proceedings of the National Academy of Sciences.

[36]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[37]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[38]  R. Jacobs,et al.  An ideal observer analysis of visual working memory. , 2012, Psychological review.

[39]  Daniel A. Braun,et al.  Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[40]  W. S. Verplanck,et al.  Nonindependence of successive responses in measurements of the visual threshold. , 1952, Journal of experimental psychology.

[41]  R. Clark,et al.  Competition and Cooperation , 2004 .

[42]  W. Brown Animal Intelligence: Experimental Studies , 1912, Nature.

[43]  Anne G. E. Collins,et al.  The tortoise and the hare: interactions between reinforcement learning and working memory , 2017, bioRxiv.

[44]  Kevin J. Miller,et al.  Habits without Values , 2016, bioRxiv.

[45]  Wendy Wood,et al.  Habit and intention in everyday life: The multiple processes by which past behavior predicts future behavior. , 1998 .

[46]  Gerd Gigerenzer,et al.  Heuristic decision making. , 2011, Annual review of psychology.

[47]  Julie C. Helmers,et al.  Chunking as a rational strategy for lossy data compression in visual working memory , 2017, bioRxiv.

[48]  Joseph W. Kable,et al.  The complexity of model-free and model-based learning strategies , 2020 .

[49]  J. Macke,et al.  Quantifying the effect of intertrial dependence on perceptual decisions. , 2014, Journal of vision.

[50]  Daniel A. Braun,et al.  Quantifying Motor Task Performance by Bounded Rational Decision Theory , 2018, Front. Neurosci..

[51]  C. I. Howarth,et al.  Non-Random Sequences in Visual Threshold Experiments , 1956 .

[52]  Miriam Sebold,et al.  Processing speed enhances model-based over model-free reinforcement learning in the presence of high working memory functioning , 2014, Front. Psychol..

[53]  Toby Berger,et al.  Rate distortion theory : a mathematical basis for data compression , 1971 .

[54]  A. Markman,et al.  The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive , 2013 .

[55]  John Langford,et al.  PAC-MDL Bounds , 2003, COLT.

[56]  Falk Lieder,et al.  Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources , 2019, Behavioral and Brain Sciences.

[57]  Anne E. Urai,et al.  Choice history biases subsequent evidence accumulation , 2019, eLife.

[58]  A. Glöckner,et al.  Oops, I did it again--relapse errors in routinized decision making , 2004 .

[59]  Samuel J. Gershman,et al.  Computational rationality: A converging paradigm for intelligence in brains, minds, and machines , 2015, Science.

[60]  Jonathan W. Pillow,et al.  Bayesian and Quasi-Bayesian Estimators for Mutual Information from Discrete Data , 2013, Entropy.

[61]  A. Markman,et al.  Journal of Experimental Psychology : General Retrospective Revaluation in Sequential Decision Making : A Tale of Two Systems , 2012 .

[62]  Anne E. Urai,et al.  Confirmation Bias through Selective Overweighting of Choice-Consistent Evidence , 2018, Current Biology.

[63]  Naftali Tishby,et al.  Dopaminergic Balance between Reward Maximization and Policy Complexity , 2011, Front. Syst. Neurosci..