Joint modeling of reaction times and choice improves parameter identifiability in reinforcement learning models

Reinforcement learning models provide excellent descriptions of learning in a variety of tasks. Many researchers are interested in relating parameters of reinforcement learning models to psychological or neural variables of interest. We demonstrate that parameter identification is difficult due to the fact that a range of parameter values provide approximately equal quality fits to data. This identification problem has a large impact on power: we show that a researcher who wants to detect a medium sized correlation (r = .3) with 80% power between a psychological/neural variable and learning rate must collect 60% more subjects than specified by a typical power analysis in order to account for the noise introduced by model fitting. We introduce a method that exploits the information contained in reaction times to constrain model fitting and show using simulation and empirical data that it improves the ability to recover learning rates.

[1]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[2]  Samuel M. McClure,et al.  BOLD Responses Reflecting Dopaminergic Signals in the Human Ventral Tegmental Area , 2008, Science.

[3]  John Done,et al.  Corrigendum to “Prediction error in reinforcement learning: A meta-analysis of neuroimaging studies” [Neurosci. Biobehav. Rev. 37 (7), (2013) 1297–1310] , 2014, Neuroscience & Biobehavioral Reviews.

[4]  Vincent D Costa,et al.  Amygdala and Ventral Striatum Make Distinct Contributions to Reinforcement Learning , 2016, Neuron.

[5]  N. Daw,et al.  Learning the opportunity cost of time in a patch-foraging task , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[6]  John P. A. Ioannidis,et al.  Erratum: Power failure: Why small sample size undermines the reliability of neuroscience (Nature Reviews Neuroscience (2013) 14 (365-376)) , 2013 .

[7]  Timothy E. J. Behrens,et al.  Neural Mechanisms of Foraging , 2012, Science.

[8]  W. Newsome,et al.  The Variable Discharge of Cortical Neurons: Implications for Connectivity, Computation, and Information Coding , 1998, The Journal of Neuroscience.

[9]  Erin Kendall Braun,et al.  Episodic Memory Encoding Interferes with Reward Learning and Decreases Striatal Prediction Errors , 2014, The Journal of Neuroscience.

[10]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[11]  N. Daw,et al.  Characterizing a psychiatric symptom dimension related to deficits in goal-directed control , 2016, eLife.

[12]  Dylan A. Simon,et al.  Model-based choices involve prospective neural activity , 2015, Nature Neuroscience.

[13]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[14]  Robert C. Wilson,et al.  Is Model Fitting Necessary for Model-Based fMRI? , 2015, PLoS Comput. Biol..

[15]  Samuel M. McClure,et al.  More Is Meaningful: The Magnitude Effect in Intertemporal Choice Depends on Self-Control , 2017, Psychological science.

[16]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[17]  N. Daw,et al.  Dissociating hippocampal and striatal contributions to sequential prediction learning , 2012, The European journal of neuroscience.

[18]  Yuan Chang Leong,et al.  Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments , 2017, Neuron.

[19]  A. Rangel,et al.  Multialternative drift-diffusion model predicts the relationship between visual fixations and choice in value-based decisions , 2011, Proceedings of the National Academy of Sciences.

[20]  Jonathan D. Cohen,et al.  The Expected Value of Control: An Integrative Theory of Anterior Cingulate Cortex Function , 2013, Neuron.

[21]  N. Daw,et al.  Reinforcement Learning Signals in the Human Striatum Distinguish Learners from Nonlearners during Reward-Based Decision Making , 2007, The Journal of Neuroscience.

[22]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[23]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[24]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[25]  M. Stone Models for choice-reaction time , 1960 .

[26]  S. Gershman Empirical priors for reinforcement learning models , 2016 .

[27]  Georges El Fakhri,et al.  Frontostriatal and Dopamine Markers of Individual Differences in Reinforcement Learning: A Multi-modal Investigation , 2018, Cerebral cortex.

[28]  M. S. Spektor,et al.  The relative merit of empirical priors in non-identifiable and sloppy models: Applications to models of learning and decision-making , 2018, Psychonomic bulletin & review.

[29]  Arturo Bouzas,et al.  Hierarchical Bayesian modeling of intertemporal choice , 2017, Judgment and Decision Making.

[30]  Joseph W. Kable,et al.  The valuation system: A coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value , 2013, NeuroImage.

[31]  Y. Niv Reinforcement learning in the brain , 2009 .

[32]  Klaus Wunderlich,et al.  Neural computations underlying action-based decision making in the human brain , 2009, Proceedings of the National Academy of Sciences.

[33]  N. Daw,et al.  Model-based learning protects against forming habits , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[34]  M. Frank,et al.  From reinforcement learning models to psychiatric and neurological disorders , 2011, Nature Neuroscience.

[35]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[36]  John P O'Doherty,et al.  Model-based approaches to neuroimaging: combining reinforcement learning theory with fMRI data. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[37]  P. Glimcher,et al.  Value Representations in the Primate Striatum during Matching Behavior , 2008, Neuron.

[38]  Roger Ratcliff,et al.  The Diffusion Decision Model: Theory and Data for Two-Choice Decision Tasks , 2008, Neural Computation.

[39]  Y. Niv,et al.  Dissociable effects of surprising rewards on learning and memory , 2017, bioRxiv.

[40]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[41]  Michael J. Frank,et al.  Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning , 2007, Proceedings of the National Academy of Sciences.

[42]  Samuel M. McClure,et al.  Separate Neural Systems Value Immediate and Delayed Monetary Rewards , 2004, Science.

[43]  Alice Y. Chiang,et al.  Working-memory capacity protects model-based learning from stress , 2013, Proceedings of the National Academy of Sciences.

[44]  T. Yarkoni Big Correlations in Little Studies: Inflated fMRI Correlations Reflect Low Statistical Power—Commentary on Vul et al. (2009) , 2009, Perspectives on psychological science : a journal of the Association for Psychological Science.

[45]  Timothy D. Hanks,et al.  Neural underpinnings of the evidence accumulator , 2016, Current Opinion in Neurobiology.