A hierarchical Bayesian approach to assess learning and guessing strategies in reinforcement learning

Abstract In two-armed bandit tasks participants learn which stimulus in a stimulus pair is associated with the highest value. In typical reinforcement learning studies, participants are presented with several pairs in a random order; frequently applied analyses assume each pair is learned in a similar way. When tasks become more difficult, however, participants may learn some stimulus pairs while they fail to learn other pairs, that is, they simply guess for a subset of pairs. We put forward the Reinforcement Learning/Guessing (RLGuess) model — enabling researchers to model this learning and guessing process. We implemented the model in a Bayesian hierarchical framework. Simulations showed that the RLGuess model outperforms a standard reinforcement learning model when participants guess: Fit is enhanced and parameter estimates become unbiased. An empirical application illustrates the merits of the RLGuess model.

[1]  Dani Gamerman,et al.  Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference , 1997 .

[2]  M. Lee,et al.  Bayesian Cognitive Modeling: A Practical Course , 2014 .

[3]  Samuel J Gershman,et al.  Do learning rates adapt to the distribution of rewards? , 2015, Psychonomic bulletin & review.

[4]  Maartje E. J. Raijmakers,et al.  The Neural Coding of Feedback Learning across Child and Adolescent Development , 2014, Journal of Cognitive Neuroscience.

[5]  Maarten Speekenbrink,et al.  Uncertainty and Exploration in a Restless Bandit Problem , 2015, Top. Cogn. Sci..

[6]  S. Gershman Empirical priors for reinforcement learning models , 2016 .

[7]  Jutta Kray,et al.  Developmental differences in learning and error processing: evidence from ERPs. , 2009, Psychophysiology.

[8]  J. Busemeyer,et al.  A contribution of cognitive decision models to clinical assessment: decomposing performance on the Bechara gambling task. , 2002, Psychological assessment.

[9]  Marissa A. Gorlick,et al.  Stress modulates reinforcement learning in younger and older adults. , 2013, Psychology and aging.

[10]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[11]  P. Read Montague,et al.  Reinforcement Learning: An Introduction, by Sutton, R.S. and Barto, A.G. , 1999, Trends in Cognitive Sciences.

[12]  Catherine A. Hartley,et al.  From Creatures of Habit to Goal-Directed Learners , 2016, Psychological science.

[13]  K. R. Ridderinkhof,et al.  A computational account of altered error processing in older age: Dopamine and the error-related negativity , 2002, Cognitive, affective & behavioral neuroscience.

[14]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[15]  Robert C. Wilson,et al.  Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms , 2015, The Journal of Neuroscience.

[16]  David J. Spiegelhalter,et al.  Introducing Markov chain Monte Carlo , 1995 .

[17]  Zeb Kurth-Nelson,et al.  Model-Based Reasoning in Humans Becomes Automatic with Training , 2015, PLoS Comput. Biol..

[18]  M. Khamassi,et al.  Contextual modulation of value signals in reward and punishment learning , 2015, Nature Communications.

[19]  J. O'Doherty,et al.  Is Avoiding an Aversive Outcome Rewarding? Neural Substrates of Avoidance Learning in the Human Brain , 2006, PLoS biology.

[20]  B. Efron,et al.  Stein's Paradox in Statistics , 1977 .

[21]  Anne G E Collins,et al.  How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis , 2012, The European journal of neuroscience.

[22]  B. Eppinger,et al.  Better or worse than expected? Aging, learning, and the ERN , 2008, Neuropsychologia.

[23]  B. B. Doll,et al.  Experiential reward learning outweighs instruction prior to adulthood , 2015, Cognitive, affective & behavioral neuroscience.

[24]  Michael X. Cohen,et al.  Striatum-medial prefrontal cortex connectivity predicts developmental changes in reinforcement learning. , 2012, Cerebral cortex.

[25]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[26]  M. Lee,et al.  Bayesian Benefits for the Pragmatic Researcher , 2016 .

[27]  J. O'Doherty,et al.  Reward representations and reward-related learning in the human brain: insights from neuroimaging , 2004, Current Opinion in Neurobiology.

[28]  Eveline A. Crone,et al.  Neural correlates of developmental differences in risk estimation and feedback processing , 2006, Neuropsychologia.

[29]  E. Crone,et al.  Distinct linear and non-linear trajectories of reward and punishment reversal learning during development: Relevance for dopamine's role in adolescent decision making , 2011, Developmental Cognitive Neuroscience.

[30]  A. Zibaee,et al.  Evaluation of Origanum vulgare L. essential oil as a source of toxicant and an inhibitor of physiological parameters in diamondback moth, Plutella xylustella L. (Lepidoptera: Pyralidae) , 2017 .

[31]  Michael J. Frank,et al.  Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning , 2007, Proceedings of the National Academy of Sciences.

[32]  Michael J. Frank,et al.  Learning to Avoid in Older Age , 2008 .

[33]  Viktor Müller,et al.  Life Span Differences in Electrophysiological Correlates of Monitoring Gains and Losses during Probabilistic Reinforcement Learning , 2011, Journal of Cognitive Neuroscience.

[34]  Jutta Kray,et al.  To Choose or to Avoid: Age Differences in Learning from Positive and Negative Feedback , 2011, Journal of Cognitive Neuroscience.

[35]  Anne G E Collins,et al.  UvA-DARE ( Digital Academic Repository ) Stimulus discriminability may bias value-based probabilistic learning , 2017 .

[36]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[37]  M. Lee,et al.  A Bayesian hierarchical mixture approach to individual differences: Case studies in selective attention and representation in category learning ☆ , 2014 .

[38]  C. Stern,et al.  Medial temporal and prefrontal contributions to working memory tasks with novel and familiar stimuli , 2001, Hippocampus.

[39]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[40]  D. Howard,et al.  Adult age differences in learning from positive and negative probabilistic feedback. , 2010, Neuropsychology.

[41]  C. Gallistel,et al.  The learning curve: implications of a quantitative analysis. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[42]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[43]  Sham M. Kakade,et al.  Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[44]  E. Wagenmakers,et al.  Absolute performance of reinforcement-learning models for the Iowa Gambling Task , 2014 .

[45]  E. Phelps,et al.  Stress attenuates the flexible updating of aversive value , 2017, Proceedings of the National Academy of Sciences.

[46]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[47]  P. Dayan,et al.  Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain , 2012, The Journal of Neuroscience.

[48]  M. Lee,et al.  Modeling individual differences in cognition , 2005, Psychonomic bulletin & review.

[49]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[50]  Daniel Brandeis,et al.  Cognitive flexibility in adolescence: Neural and behavioral mechanisms of reward prediction error processing in adaptive decision making during development , 2015, NeuroImage.

[51]  M. Frank,et al.  Instructional control of reinforcement learning: A behavioral and neurocomputational investigation , 2009, Brain Research.

[52]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[53]  Scott D. Brown,et al.  A simple introduction to Markov Chain Monte–Carlo sampling , 2016, Psychonomic bulletin & review.

[54]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[55]  S. Rombouts,et al.  Better than Expected or as Bad as You Thought? The Neurocognitive Development of Probabilistic Feedback Processing , 2009, Front. Hum. Neurosci..

[56]  H. Huizenga,et al.  Positive-blank versus negative-blank feedback learning in children and adults , 2018, Quarterly journal of experimental psychology.

[57]  Michael D. Lee,et al.  A Survey of Model Evaluation Approaches With a Tutorial on Hierarchical Bayesian Methods , 2008, Cogn. Sci..

[58]  M. Frank,et al.  Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. , 2009, Nature neuroscience.

[59]  M. Pessiglione,et al.  Critical Roles for Anterior Insula and Dorsal Striatum in Punishment-Based Avoidance Learning , 2012, Neuron.

[60]  N. Daw,et al.  Human Reinforcement Learning Subdivides Structured Action Spaces by Learning Effector-Specific Values , 2009, The Journal of Neuroscience.

[61]  Michael D. Lee,et al.  Psychological models of human and optimal performance in bandit problems , 2011, Cognitive Systems Research.

[62]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[63]  Michael J. Brammer,et al.  Neural and Psychological Maturation of Decision-making in Adolescence and Young Adulthood , 2013, Journal of Cognitive Neuroscience.