论文信息 - A hierarchical Bayesian approach to assess learning and guessing strategies in reinforcement learning - 字舞流文

A hierarchical Bayesian approach to assess learning and guessing strategies in reinforcement learning

Abstract In two-armed bandit tasks participants learn which stimulus in a stimulus pair is associated with the highest value. In typical reinforcement learning studies, participants are presented with several pairs in a random order; frequently applied analyses assume each pair is learned in a similar way. When tasks become more difficult, however, participants may learn some stimulus pairs while they fail to learn other pairs, that is, they simply guess for a subset of pairs. We put forward the Reinforcement Learning/Guessing (RLGuess) model — enabling researchers to model this learning and guessing process. We implemented the model in a Bayesian hierarchical framework. Simulations showed that the RLGuess model outperforms a standard reinforcement learning model when participants guess: Fit is enhanced and parameter estimates become unbiased. An empirical application illustrates the merits of the RLGuess model.

Marieke Jepma | Hilde M. Huizenga | Jessica V. Schaaf | Jessica Vera Schaaf | Ingmar Visser | M. Jepma | H. Huizenga | I. Visser

[1] Dani Gamerman,et al. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference , 1997 .

[2] M. Lee,et al. Bayesian Cognitive Modeling: A Practical Course , 2014 .

[3] Samuel J Gershman,et al. Do learning rates adapt to the distribution of rewards? , 2015, Psychonomic bulletin & review.

[4] Maartje E. J. Raijmakers,et al. The Neural Coding of Feedback Learning across Child and Adolescent Development , 2014, Journal of Cognitive Neuroscience.

[5] Maarten Speekenbrink,et al. Uncertainty and Exploration in a Restless Bandit Problem , 2015, Top. Cogn. Sci..

[6] S. Gershman. Empirical priors for reinforcement learning models , 2016 .

[7] Jutta Kray,et al. Developmental differences in learning and error processing: evidence from ERPs. , 2009, Psychophysiology.

[8] J. Busemeyer,et al. A contribution of cognitive decision models to clinical assessment: decomposing performance on the Bechara gambling task. , 2002, Psychological assessment.

[9] Marissa A. Gorlick,et al. Stress modulates reinforcement learning in younger and older adults. , 2013, Psychology and aging.

[10] Karl J. Friston,et al. Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[11] P. Read Montague,et al. Reinforcement Learning: An Introduction, by Sutton, R.S. and Barto, A.G. , 1999, Trends in Cognitive Sciences.

[12] Catherine A. Hartley,et al. From Creatures of Habit to Goal-Directed Learners , 2016, Psychological science.

[13] K. R. Ridderinkhof,et al. A computational account of altered error processing in older age: Dopamine and the error-related negativity , 2002, Cognitive, affective & behavioral neuroscience.

[14] Jacob Cohen. Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[15] Robert C. Wilson,et al. Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms , 2015, The Journal of Neuroscience.

[16] David J. Spiegelhalter,et al. Introducing Markov chain Monte Carlo , 1995 .

[17] Zeb Kurth-Nelson,et al. Model-Based Reasoning in Humans Becomes Automatic with Training , 2015, PLoS Comput. Biol..

[18] M. Khamassi,et al. Contextual modulation of value signals in reward and punishment learning , 2015, Nature Communications.

[19] J. O'Doherty,et al. Is Avoiding an Aversive Outcome Rewarding? Neural Substrates of Avoidance Learning in the Human Brain , 2006, PLoS biology.

[20] B. Efron,et al. Stein's Paradox in Statistics , 1977 .

[21] Anne G E Collins,et al. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis , 2012, The European journal of neuroscience.

[22] B. Eppinger,et al. Better or worse than expected? Aging, learning, and the ERN , 2008, Neuropsychologia.

[23] B. B. Doll,et al. Experiential reward learning outweighs instruction prior to adulthood , 2015, Cognitive, affective & behavioral neuroscience.

[24] Michael X. Cohen,et al. Striatum-medial prefrontal cortex connectivity predicts developmental changes in reinforcement learning. , 2012, Cerebral cortex.

[25] P. Dayan,et al. Cortical substrates for exploratory decisions in humans , 2006, Nature.

[26] M. Lee,et al. Bayesian Benefits for the Pragmatic Researcher , 2016 .

[27] J. O'Doherty,et al. Reward representations and reward-related learning in the human brain: insights from neuroimaging , 2004, Current Opinion in Neurobiology.

[28] Eveline A. Crone,et al. Neural correlates of developmental differences in risk estimation and feedback processing , 2006, Neuropsychologia.

[29] E. Crone,et al. Distinct linear and non-linear trajectories of reward and punishment reversal learning during development: Relevance for dopamine's role in adolescent decision making , 2011, Developmental Cognitive Neuroscience.

[30] A. Zibaee,et al. Evaluation of Origanum vulgare L. essential oil as a source of toxicant and an inhibitor of physiological parameters in diamondback moth, Plutella xylustella L. (Lepidoptera: Pyralidae) , 2017 .

[31] Michael J. Frank,et al. Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning , 2007, Proceedings of the National Academy of Sciences.

[32] Michael J. Frank,et al. Learning to Avoid in Older Age , 2008 .

[33] Viktor Müller,et al. Life Span Differences in Electrophysiological Correlates of Monitoring Gains and Losses during Probabilistic Reinforcement Learning , 2011, Journal of Cognitive Neuroscience.

[34] Jutta Kray,et al. To Choose or to Avoid: Age Differences in Learning from Positive and Negative Feedback , 2011, Journal of Cognitive Neuroscience.

[35] Anne G E Collins,et al. UvA-DARE ( Digital Academic Repository ) Stimulus discriminability may bias value-based probabilistic learning , 2017 .

[36] R. Duncan Luce,et al. Individual Choice Behavior , 1959 .

[37] M. Lee,et al. A Bayesian hierarchical mixture approach to individual differences: Case studies in selective attention and representation in category learning ☆ , 2014 .

[38] C. Stern,et al. Medial temporal and prefrontal contributions to working memory tasks with novel and familiar stimuli , 2001, Hippocampus.

[39] Bradley P. Carlin,et al. Bayesian measures of model complexity and fit , 2002 .

[40] D. Howard,et al. Adult age differences in learning from positive and negative probabilistic feedback. , 2010, Neuropsychology.

[41] C. Gallistel,et al. The learning curve: implications of a quantitative analysis. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[42] R. Dolan,et al. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[43] Sham M. Kakade,et al. Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[44] E. Wagenmakers,et al. Absolute performance of reinforcement-learning models for the Iowa Gambling Task , 2014 .

[45] E. Phelps,et al. Stress attenuates the flexible updating of aversive value , 2017, Proceedings of the National Academy of Sciences.

[46] R. Rescorla,et al. A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[47] P. Dayan,et al. Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain , 2012, The Journal of Neuroscience.

[48] M. Lee,et al. Modeling individual differences in cognition , 2005, Psychonomic bulletin & review.

[49] R Core Team,et al. R: A language and environment for statistical computing. , 2014 .

[50] Daniel Brandeis,et al. Cognitive flexibility in adolescence: Neural and behavioral mechanisms of reward prediction error processing in adaptive decision making during development , 2015, NeuroImage.

[51] M. Frank,et al. Instructional control of reinforcement learning: A behavioral and neurocomputational investigation , 2009, Brain Research.

[52] Michael J. Frank,et al. By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[53] Scott D. Brown,et al. A simple introduction to Markov Chain Monte–Carlo sampling , 2016, Psychonomic bulletin & review.

[54] D. Rubin,et al. Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[55] S. Rombouts,et al. Better than Expected or as Bad as You Thought? The Neurocognitive Development of Probabilistic Feedback Processing , 2009, Front. Hum. Neurosci..

[56] H. Huizenga,et al. Positive-blank versus negative-blank feedback learning in children and adults , 2018, Quarterly journal of experimental psychology.

[57] Michael D. Lee,et al. A Survey of Model Evaluation Approaches With a Tutorial on Hierarchical Bayesian Methods , 2008, Cogn. Sci..

[58] M. Frank,et al. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. , 2009, Nature neuroscience.

[59] M. Pessiglione,et al. Critical Roles for Anterior Insula and Dorsal Striatum in Punishment-Based Avoidance Learning , 2012, Neuron.

[60] N. Daw,et al. Human Reinforcement Learning Subdivides Structured Action Spaces by Learning Effector-Specific Values , 2009, The Journal of Neuroscience.

[61] Michael D. Lee,et al. Psychological models of human and optimal performance in bandit problems , 2011, Cognitive Systems Research.

[62] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[63] Michael J. Brammer,et al. Neural and Psychological Maturation of Decision-making in Adolescence and Young Adulthood , 2013, Journal of Cognitive Neuroscience.