Learning to allocate limited time to decisions with different expected outcomes

The goal of this article is to investigate how human participants allocate their limited time to decisions with different properties. We report the results of two behavioral experiments. In each trial of the experiments, the participant must accumulate noisy information to make a decision. The participants received positive and negative rewards for their correct and incorrect decisions, respectively. The stimulus was designed such that decisions based on more accumulated information were more accurate but took longer. Therefore, the total outcome that a participant could achieve during the limited experiments' time depended on her "decision threshold", the amount of information she needed to make a decision. In the first experiment, two types of trials were intermixed randomly: hard and easy. Crucially, the hard trials were associated with smaller positive and negative rewards than the easy trials. A cue presented at the beginning of each trial would indicate the type of the upcoming trial. The optimal strategy was to adopt a small decision threshold for hard trials. The results showed that several of the participants did not learn this simple strategy. We then investigated how the participants adjusted their decision threshold based on the feedback they received in each trial. To this end, we developed and compared 10 computational models for adjusting the decision threshold. The models differ in their assumptions on the shape of the decision thresholds and the way the feedback is used to adjust the decision thresholds. The results of Bayesian model comparison showed that a model with time-varying thresholds whose parameters are updated by a reinforcement learning algorithm is the most likely model. In the second experiment, the cues were not presented. We showed that the optimal strategy is to use a single time-decreasing decision threshold for all trials. The results of the computational modeling showed that the participants did not use this optimal strategy. Instead, they attempted to detect the difficulty of the trial first and then set their decision threshold accordingly.

[1]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[2]  J. Kruschke Bayesian estimation supersedes the t test. , 2013, Journal of experimental psychology. General.

[3]  M Zacksenhouse,et al.  Robust versus optimal strategies for two-alternative forced choice tasks. , 2010, Journal of mathematical psychology.

[4]  S. Mahadevan,et al.  Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .

[5]  W. Newsome,et al.  Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. , 2001, Journal of neurophysiology.

[6]  R. Duncan Luce,et al.  Response Times: Their Role in Inferring Elementary Mental Organization , 1986 .

[7]  Scott D. Brown,et al.  Cortico-striatal connections predict control over speed and accuracy in perceptual decision making , 2010, Proceedings of the National Academy of Sciences.

[8]  Paul Cisek,et al.  Decision making by urgency gating: theory and experimental support. , 2012, Journal of neurophysiology.

[9]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[10]  Bernard W. Balleine,et al.  Actions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized , 2013, PLoS Comput. Biol..

[11]  Scott D. Brown,et al.  People adopt optimal policies in simple decision-making, after practice and guidance , 2017, Psychonomic bulletin & review.

[12]  John K. Kruschke,et al.  Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan , 2014 .

[13]  S. Gershman Empirical priors for reinforcement learning models , 2016 .

[14]  James T. Townsend,et al.  On mimicry among sequential sampling models , 2015 .

[15]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[16]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[17]  Roger Ratcliff,et al.  A Theory of Memory Retrieval. , 1978 .

[18]  P. Dayan,et al.  Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.

[19]  Jonathan D. Cohen,et al.  The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. , 2006, Psychological review.

[20]  Jonathan D. Cohen,et al.  The Quarterly Journal of Experimental Psychology Do Humans Produce the Speed–accuracy Trade-off That Maximizes Reward Rate? , 2022 .

[21]  Amnon Rapoport,et al.  Models for deferred decision making , 1971 .

[22]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[23]  Ehtibar N Dzhafarov,et al.  Unfalsifiability and mutual translatability of major modeling schemes for choice reaction time. , 2014, Psychological review.

[24]  J. Gold,et al.  Banburismus and the Brain Decoding the Relationship between Sensory Stimuli, Decisions, and Reward , 2002, Neuron.

[25]  G. Smith,et al.  Slowness and age: speed-accuracy mechanisms. , 1995, Psychology and aging.

[26]  Dylan A. Simon,et al.  Neural Correlates of Forward Planning in a Spatial Decision Task in Humans , 2011, The Journal of Neuroscience.

[27]  Adele Diederich,et al.  Simple matrix methods for analyzing diffusion models of choice probability, choice response time, and simple response time , 2003 .

[28]  J. Movshon,et al.  The analysis of visual motion: a comparison of neuronal and psychophysical performance , 1992, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[29]  Nathan F. Lepora,et al.  Threshold Learning for Optimal Decision Making , 2016, NIPS.

[30]  A. Pouget,et al.  The Cost of Accumulating Evidence in Perceptual Decision Making , 2012, The Journal of Neuroscience.

[31]  J. Wolfowitz,et al.  Optimum Character of the Sequential Probability Ratio Test , 1948 .

[32]  Alexandre Pouget,et al.  Optimal policy for value-based decision-making , 2016, Nature Communications.

[33]  Philip L. Smith,et al.  A comparison of sequential sampling models for two-choice reaction time. , 2004, Psychological review.

[34]  Grice Gr,et al.  Stimulus intensity and response evocation. , 1968 .

[35]  Philip L. Smith,et al.  Stochastic Dynamic Models of Response Time and Accuracy: A Foundational Primer. , 2000, Journal of mathematical psychology.

[36]  Michael D. Lee,et al.  Time-varying boundaries for diffusion models of decision making and response time , 2014, Front. Psychol..

[37]  D. Laming Choice reaction performance following an error , 1979 .

[38]  David Ardia,et al.  DEoptim: An R Package for Global Optimization by Differential Evolution , 2009 .

[39]  Mark Steyvers,et al.  An optimal adjustment procedure to minimize experiment time in decisions with multiple alternatives , 2012, Psychonomic bulletin & review.

[40]  James T. Townsend,et al.  The Stochastic Modeling of Elementary Psychological Processes , 1983 .

[41]  M. Shadlen,et al.  Decision-making with multiple alternatives , 2008, Nature Neuroscience.

[42]  Philip L. Smith Psychophysically principled models of visual simple reaction time. , 1995 .

[43]  Marius Usher,et al.  Disentangling decision models: from independence to competition. , 2013, Psychological review.

[44]  Corey J. Bohil,et al.  Base-rate and payoff effects in multidimensional perceptual categorization. , 1998, Journal of Experimental Psychology. Learning, Memory and Cognition.

[45]  C. Law,et al.  Reinforcement learning can account for associative and perceptual learning on a visual decision task , 2009, Nature Neuroscience.

[46]  Drew Fudenberg,et al.  Stochastic Choice and Optimal Sequential Sampling , 2015, 1505.03342.

[47]  Scott D. Brown,et al.  Revisiting the Evidence for Collapsing Boundaries and Urgency Signals in Perceptual Decision-Making , 2015, The Journal of Neuroscience.

[48]  R. Ratcliff,et al.  Connectionist and diffusion models of reaction time. , 1999, Psychological review.

[49]  James L. McClelland,et al.  The time course of perceptual choice: the leaky, competing accumulator model. , 2001, Psychological review.

[50]  Andrew M. Saxe,et al.  Acquisition of decision making criteria: reward rate ultimately beats accuracy , 2011, Attention, perception & psychophysics.

[51]  Ernst Fehr,et al.  Irrational time allocation in decision-making , 2016, Proceedings of the Royal Society B: Biological Sciences.

[52]  Philip Holmes,et al.  Rapid decision threshold modulation by reward rate in a neural network , 2006, Neural Networks.

[53]  Jerome R. Busemeyer,et al.  Learning to maximize reward rate: a model based on semi-Markov decision processes , 2014, Front. Neurosci..

[54]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[55]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[56]  Roger Ratcliff,et al.  Comparing fixed and collapsing boundary versions of the diffusion model. , 2016, Journal of mathematical psychology.

[57]  Jonathan D. Cohen,et al.  Reward rate optimization in two-alternative decision making: empirical tests of theoretical predictions. , 2009, Journal of experimental psychology. Human perception and performance.

[58]  J. Townsend,et al.  Multialternative Decision Field Theory: A Dynamic Connectionist Model of Decision Making , 2001 .

[59]  Jeffrey N. Rouder,et al.  Modeling Response Times for Two-Choice Decisions , 1998 .

[60]  Douglas Vickers,et al.  Dynamic Models of Simple Judgments: II. Properties of a Self-Organizing PAGAN (Parallel, Adaptive, Generalized Accumulator Network) Model for Multi-Choice Tasks , 2000 .

[61]  Jochen Ditterich,et al.  Stochastic models of decisions about motion direction: Behavior and physiology , 2006, Neural Networks.

[62]  Peter I. Frazier,et al.  Sequential Hypothesis Testing under Stochastic Deadlines , 2007, NIPS.

[63]  Andrew Heathcote,et al.  A ballistic model of choice response time. , 2005, Psychological review.

[64]  Philip L. Smith,et al.  Dual diffusion model for single-cell recording data from the superior colliculus in a brightness-discrimination task. , 2007, Journal of neurophysiology.

[65]  Alex M. Andrew,et al.  ROBOT LEARNING, edited by Jonathan H. Connell and Sridhar Mahadevan, Kluwer, Boston, 1993/1997, xii+240 pp., ISBN 0-7923-9365-1 (Hardback, 218.00 Guilders, $120.00, £89.95). , 1999, Robotica (Cambridge. Print).

[66]  N. Daw,et al.  Reinforcement learning models of the dopamine system and their behavioral implications , 2003 .

[67]  R. Ratcliff A diffusion model account of response time and accuracy in a brightness discrimination task: Fitting real data and failing to fit fake but plausible data , 2002, Psychonomic bulletin & review.

[68]  Timothy D. Hanks,et al.  Bounded Integration in Parietal Cortex Underlies Decisions Even When Viewing Duration Is Dictated by the Environment , 2008, The Journal of Neuroscience.

[69]  Rajesh P. N. Rao,et al.  Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes , 2010, Front. Comput. Neurosci..

[70]  Jerome R. Busemeyer,et al.  Psychological models of deferred decision making , 1988 .

[71]  R. Ratcliff,et al.  A Diffusion Model Account of Criterion Shifts in the Lexical Decision Task. , 2008, Journal of memory and language.

[72]  Patrick Simen,et al.  Speed accuracy trade-off under response deadlines , 2014, Front. Neurosci..

[73]  A F Sanders,et al.  Decision making during paced arrival of probabilistic information. , 1967, Acta psychologica.

[74]  Jerome R. Busemeyer,et al.  Dynamic Decision Making , 2015 .

[75]  HighWire Press The journal of neuroscience : the official journal of the Society for Neuroscience. , 1981 .

[76]  R. Marois,et al.  fMRI Evidence for a Dual Process Account of the Speed-Accuracy Tradeoff in Decision-Making , 2008, PloS one.

[77]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.