A new model of decision processing in instrumental learning tasks

Learning and decision making are interactive processes, yet cognitive modelling of error-driven learning and decision making have largely evolved separately. Recently, evidence accumulation models (EAMs) of decision making and reinforcement learning (RL) models of error-driven learning have been combined into joint RL-EAMs that can in principle address these interactions. However, we show that the most commonly used combination, based on the diffusion decision model (DDM) for binary choice, consistently fails to capture crucial aspects of response times observed during reinforcement learning. We propose a new RL-EAM based on an advantage racing diffusion (ARD) framework for choices among two or more options that not only addresses this problem but captures stimulus difficulty, speed-accuracy trade-off, and stimulus-response-mapping reversal effects. The RL-ARD avoids fundamental limitations imposed by the DDM on addressing effects of absolute values of choices, as well as extensions beyond binary choice, and provides a computationally tractable basis for wider applications.

[1]  A. Woods,et al.  Context Modulates the Contribution of Time and Space in Causal Inference , 2012, Front. Psychology.

[2]  Leendert van Maanen,et al.  The interpretation of behavior-model correlations in unidentified cognitive models , 2020, Psychonomic Bulletin & Review.

[3]  Diego Romero-Ávila,et al.  revisiting the evidence , 2017 .

[4]  Leendert van Maanen,et al.  Caution in decision-making under time pressure is mediated by timing ability , 2019, Cognitive Psychology.

[5]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[6]  Vincent D Costa,et al.  The Role of Frontal Cortical and Medial-Temporal Lobe Brain Areas in Learning a Bayesian Prior Belief on Reversals , 2015, The Journal of Neuroscience.

[7]  Scott D. Brown,et al.  Diffusion Decision Model: Current Issues and History , 2016, Trends in Cognitive Sciences.

[8]  E. Koechlin,et al.  The Importance of Falsification in Computational Cognitive Modeling , 2017, Trends in Cognitive Sciences.

[9]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[10]  Michael J. Frank,et al.  Simultaneous Hierarchical Bayesian Parameter Estimation for Reinforcement Learning and Drift Diffusion Models: a Tutorial and Links to Neural Data , 2020, Computational brain & behavior.

[11]  F. E. Satterthwaite Synthesis of variance , 1941 .

[12]  Michael J. Frank,et al.  Within and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory , 2017, bioRxiv.

[13]  Brandon M. Turner,et al.  Toward a common representational framework for adaptation. , 2019, Psychological review.

[14]  Scott D. Brown,et al.  The power law repealed: The case for an exponential law of practice , 2000, Psychonomic bulletin & review.

[15]  G. Logan,et al.  Inhibitory control in mind and brain: an interactive race model of countermanding saccades. , 2007, Psychological review.

[16]  Roger Ratcliff,et al.  Inhibition in Superior Colliculus Neurons in a Brightness Discrimination Task? , 2011, Neural Computation.

[17]  M. S. Spektor,et al.  The relative merit of empirical priors in non-identifiable and sloppy models: Applications to models of learning and decision-making , 2018, Psychonomic bulletin & review.

[18]  Scott D. Brown,et al.  The overconstraint of response time models: Rethinking the scaling problem , 2009, Psychonomic bulletin & review.

[19]  R. Bogacz,et al.  The neural basis of the speed–accuracy tradeoff , 2010, Trends in Neurosciences.

[20]  Richard S. Sutton,et al.  Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.

[21]  Roger Ratcliff,et al.  A Theory of Memory Retrieval. , 1978 .

[22]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[23]  P. Dayan,et al.  Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.

[24]  Roger Ratcliff,et al.  Modeling 2-alternative forced-choice tasks: Accounting for both magnitude and difference effects , 2018, Cognitive Psychology.

[25]  M. Frank,et al.  Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. , 2009, Nature neuroscience.

[26]  M. Frank,et al.  The drift diffusion model as the choice rule in reinforcement learning , 2017, Psychonomic bulletin & review.

[27]  Peter R Murphy,et al.  Global gain modulation generates time-dependent urgency during perceptual choice in humans , 2016, Nature Communications.

[28]  Samuel M. McClure,et al.  Short-term memory traces for action bias in human reinforcement learning , 2007, Brain Research.

[29]  Richard P. Heitz,et al.  Neurally constrained modeling of perceptual decision making. , 2010, Psychological review.

[30]  R. Pew,et al.  Speed-Accuracy Tradeoff in Reaction Time: Effect of Discrete Criterion Times , 1968 .

[31]  Per B. Brockhoff,et al.  lmerTest Package: Tests in Linear Mixed Effects Models , 2017 .

[32]  Sham M. Kakade,et al.  Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[33]  Scott D. Brown,et al.  Revisiting the Evidence for Collapsing Boundaries and Urgency Signals in Perceptual Decision-Making , 2015, The Journal of Neuroscience.

[34]  A. Voss,et al.  Diffusion models in experimental psychology: a practical introduction. , 2013, Experimental psychology.

[35]  Michael Moutoussis,et al.  Improving the reliability of model-based decision-making estimates in the two-stage decision task with reaction-times and drift-diffusion modeling , 2019, PLoS Comput. Biol..

[36]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[37]  Wolfgang M. Pauli,et al.  Learning, Reward, and Decision Making , 2017, Annual review of psychology.

[38]  Scott D. Brown,et al.  The hare and the tortoise: emphasizing speed can change the evidence used to make decisions. , 2014, Journal of experimental psychology. Learning, memory, and cognition.

[39]  Elise Lesage,et al.  Vicarious Reinforcement Learning Signals When Instructing Others , 2015, The Journal of Neuroscience.

[40]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[41]  Brandon M. Turner,et al.  A method for efficiently sampling from distributions with correlated dimensions. , 2013, Psychological methods.

[42]  P. Dayan,et al.  Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain , 2012, The Journal of Neuroscience.

[43]  Andrew Heathcote,et al.  Refining the Law of Practice , 2018, Psychological review.

[44]  Scott D. Brown,et al.  Response Times and Decision‐Making , 2018 .

[45]  J. Rieskamp,et al.  A reinforcement learning diffusion decision model for value-based decisions , 2019, Psychonomic Bulletin & Review.

[46]  R. Ratcliff,et al.  Sequential Sampling Models in Cognitive Neuroscience: Advantages, Applications, and Extensions. , 2016, Annual review of psychology.

[47]  Russell J. Boag,et al.  Mutual benefits: Combining reinforcement learning with sequential sampling models , 2019, Neuropsychologia.

[48]  Eduardo Alonso,et al.  A Rescorla-Wagner drift-diffusion model of conditioning and timing , 2017, bioRxiv.

[49]  Andrew Heathcote,et al.  Accumulating advantages: A new conceptualization of rapid multiple choice. , 2020, Psychological review.

[50]  J. Gray,et al.  PsychoPy2: Experiments in behavior made easy , 2019, Behavior Research Methods.

[51]  J BraakCajo A Markov Chain Monte Carlo version of the genetic algorithm Differential Evolution , 2006 .

[52]  Andrew Heathcote,et al.  Linear Deterministic Accumulator Models of Simple Choice , 2012, Front. Psychology.

[53]  G. Logan On the ability to inhibit thought and action , 1984 .

[54]  Cajo J. F. ter Braak,et al.  A Markov Chain Monte Carlo version of the genetic algorithm Differential Evolution: easy Bayesian computing for real parameter spaces , 2006, Stat. Comput..

[55]  P. Dayan,et al.  The algorithmic anatomy of model-based evaluation , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[56]  M. Khamassi,et al.  Contextual modulation of value signals in reward and punishment learning , 2015, Nature Communications.

[57]  M. Usher,et al.  Absolutely relative or relatively absolute: violations of value invariance in human decision making , 2016, Psychonomic bulletin & review.

[58]  R. Ratcliff,et al.  Modeling reaction time and accuracy of multiple-alternative decisions , 2010, Attention, perception & psychophysics.

[59]  Arndt Bröder,et al.  Empirical validation of the diffusion model for recognition memory and a comparison of parameter-estimation methods , 2015, Psychological research.

[60]  M. Batzer,et al.  Reading TE leaves: new approaches to the identification of transposable element insertions. , 2011, Genome research.

[61]  Hong-Wei Xue,et al.  Arabidopsis PROTEASOME REGULATOR1 is required for auxin-mediated suppression of proteasome activity and regulates auxin signalling , 2016, Nature Communications.

[62]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[63]  P. Cisek,et al.  Decisions in Changing Conditions: The Urgency-Gating Model , 2009, The Journal of Neuroscience.

[64]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[65]  Scott D. Brown,et al.  The simplest complete model of choice response time: Linear ballistic accumulation , 2008, Cognitive Psychology.

[66]  Anne G E Collins,et al.  Modeling the influence of working memory, reinforcement, and action uncertainty on reaction time and choice during instrumental learning , 2019, Psychonomic bulletin & review.

[67]  Roger Ratcliff,et al.  The Diffusion Decision Model: Theory and Data for Two-Choice Decision Tasks , 2008, Neural Computation.

[68]  T. Ando Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models , 2007 .

[69]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[70]  Scott D. Brown,et al.  The Optimality of Sensory Processing during the Speed–Accuracy Tradeoff , 2012, The Journal of Neuroscience.

[71]  Gordon D. Logan,et al.  Sequential sampling models without random between-trial variability: the racing diffusion model of speeded decision making , 2020, Psychonomic Bulletin & Review.

[72]  Michael J. Frank,et al.  Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning , 2007, Proceedings of the National Academy of Sciences.

[73]  Andrew Heathcote,et al.  Urgency, Leakage, and the Relative Nature of Information Processing in Decision-making , 2019, bioRxiv.

[74]  Andrew Heathcote,et al.  Drawing conclusions from choice response time models: A tutorial using the linear ballistic accumulator , 2011 .

[75]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[76]  Birte U. Forstmann,et al.  Parameter recovery for the Leaky Competing Accumulator model , 2017 .

[77]  David K. Sewell,et al.  Modeling the Effect of Speed Emphasis in Probabilistic Category Learning , 2020, Computational Brain & Behavior.

[78]  P. Rudebeck,et al.  The neural basis of reversal learning: An updated perspective , 2017, Neuroscience.

[79]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[80]  David K. Sewell,et al.  Combining error-driven models of associative learning with evidence accumulation models of decision-making , 2019, Psychonomic Bulletin & Review.

[81]  Andrew Heathcote,et al.  An introduction to good practices in cognitive modeling , 2015 .

[82]  Philip L. Smith,et al.  Dual diffusion model for single-cell recording data from the superior colliculus in a brightness-discrimination task. , 2007, Journal of neurophysiology.

[83]  Stefano Palminteri,et al.  Decomposing the effects of context valence and feedback information on speed and accuracy during reinforcement learning: a meta-analytical approach using diffusion decision modeling , 2018, Cognitive, Affective, & Behavioral Neuroscience.

[84]  Samuel Gershman,et al.  Pavlovian Control of Escape and Avoidance , 2018, Journal of Cognitive Neuroscience.

[85]  Anne G E Collins,et al.  How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis , 2012, The European journal of neuroscience.

[86]  Frank J. Bruggeman,et al.  The number of active metabolic pathways is bounded by the number of cellular constraints at maximal metabolic rates , 2018, bioRxiv.

[87]  Vincent D Costa,et al.  Reversal Learning and Dopamine: A Bayesian Perspective , 2015, The Journal of Neuroscience.

[88]  A. Damasio,et al.  Insensitivity to future consequences following damage to human prefrontal cortex , 1994, Cognition.

[89]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[90]  Leendert van Maanen,et al.  Core body temperature speeds up temporal processing and choice behavior under deadlines , 2019, Scientific Reports.

[91]  G. Logan,et al.  On the ability to inhibit thought and action: general and special theories of an act of control. , 2014, Psychological review.

[92]  Michael J. Brammer,et al.  Neural and Psychological Maturation of Decision-making in Adolescence and Young Adulthood , 2013, Journal of Cognitive Neuroscience.

[93]  Rafal Bogacz,et al.  Integration of Reinforcement Learning and Optimal Decision-Making Theories of the Basal Ganglia , 2011, Neural Computation.

[94]  Shayne Loft,et al.  Cognitive control and capacity for prospective memory in complex dynamic environments. , 2019, Journal of experimental psychology. General.

[95]  Andrew Heathcote,et al.  Dynamic models of choice , 2018, Behavior Research Methods.

[96]  A. Voss,et al.  Interpreting the parameters of the diffusion model: An empirical validation , 2004, Memory & cognition.

[97]  R. Moran Thou shalt identify! The identifiability of two high-threshold models in confidence-rating recognition (and super-recognition) paradigms , 2016 .

[98]  Jeffrey N. Rouder,et al.  Modeling Response Times for Two-Choice Decisions , 1998 .

[99]  Leendert van Maanen,et al.  Not all Speed-Accuracy Trade-Off Manipulations Have the Same Psychological Effect , 2020, Computational Brain & Behavior.

[100]  R. Anders,et al.  The shifted Wald distribution for response time data analysis. , 2016, Psychological methods.

[101]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[102]  J. Rieskamp,et al.  Comparing perceptual and preferential decision making , 2016, Psychonomic bulletin & review.

[103]  P. Cisek,et al.  Modulation of Premotor and Primary Motor Cortical Activity during Volitional Adjustments of Speed-Accuracy Trade-Offs , 2016, The Journal of Neuroscience.

[104]  S. Miletić Neural Evidence for a Role of Urgency in the Speed-Accuracy Trade-off in Perceptual Decision-Making , 2016, The Journal of Neuroscience.

[105]  Samuel J Gershman,et al.  Do learning rates adapt to the distribution of rewards? , 2015, Psychonomic bulletin & review.

[106]  Russell J. Boag,et al.  Strategic attention and decision control support prospective memory in a complex dual-task environment , 2019, Cognition.