Putting bandits into context: How function learning supports decision making

We introduce the contextual multi-armed bandit task as a framework to investigate learning and decision making in uncertain environments. In this novel paradigm, participants repeatedly choose between multiple options in order to maximise their rewards. The options are described by a number of contextual features which are predictive of the rewards through initially unknown functions. From their experience with choosing options and observing the consequences of their decisions, participants can learn about the functional relation between contexts and rewards and improve their decision strategy over time. In three experiments, we explore participants’ behaviour in such learning environments. We predict participants’ behaviour by context-blind (mean-tracking, Kalman filter) and contextual (Gaussian process and linear regression) learning approaches combined with different choice strategies. Participants are mostly able to learn about the context-reward functions and their behaviour is best described by a Gaussian process learning strategy which generalizes previous experience to similar instances. In a relatively simple task with binary features, they seem to combine this learning with a “probability of improvement” decision strategy which focuses on alternatives that are expected to lead to an improvement upon a current favourite option. In a task with continuous features that are linearly related to the rewards, participants seem to more explicitly balance exploration and exploitation. Finally, in a difficult learning environment where the relation between features and rewards is non-linear, some participants are again well-described by a Gaussian process learning strategy, whereas others revert to context-blind strategies.

[1]  E. H. Shuford,et al.  Comparison of predictions and estimates in a probability learning situation. , 1959, Journal of experimental psychology.

[2]  R. Luce,et al.  A threshold theory for simple detection experiments. , 1963, Psychological review.

[3]  J. Carroll FUNCTIONAL LEARNING: THE LEARNING OF CONTINUOUS FUNCTIONAL MAPPINGS RELATING STIMULUS AND RESPONSE CONTINUA , 1963 .

[4]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[5]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[6]  P. Zarembka Frontiers in econometrics , 1973 .

[7]  H. Akaike A new look at the statistical model identification , 1974 .

[8]  D. A. King,et al.  Contextual control of the extinction of conditioned fear: tests for the associative value of the context. , 1983, Journal of experimental psychology. Animal behavior processes.

[9]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[10]  D. Meyer,et al.  Function learning: induction of continuous stimulus-response relations. , 1991, Journal of experimental psychology. Learning, memory, and cognition.

[11]  J. Douglas Carroll,et al.  Toward a new paradigm for the study of multiattribute choice behavior: Spatial and discrete modeling of pairwise preferences. , 1991 .

[12]  M. McDaniel,et al.  Extrapolation: the sine qua non for abstraction in function learning. , 1997, Journal of experimental psychology. Learning, memory, and cognition.

[13]  J. Kruschke,et al.  A model of probabilistic category learning. , 1999, Journal of experimental psychology. Learning, memory, and cognition.

[14]  Peter Sollich Gaussian Process Regression with Mismatched Models , 2001, NIPS.

[15]  M. Gluck,et al.  How do people solve the "weather prediction" task?: individual variability in strategies for probabilistic category learning. , 2002, Learning & memory.

[16]  I. Erev,et al.  Small feedback‐based decisions and their limited correspondence to description‐based decisions , 2003 .

[17]  Lewis Bott,et al.  Nonmonotonic extrapolation in function learning. , 2004, Journal of experimental psychology. Learning, memory, and cognition.

[18]  Stephan Lewandowsky,et al.  Population of linear experts: knowledge partitioning and function learning. , 2004, Psychological review.

[19]  H. Damasio,et al.  The Iowa Gambling Task and the somatic marker hypothesis: some questions and answers , 2005, Trends in Cognitive Sciences.

[20]  J. Busemeyer,et al.  Learning Functional Relations Based on Experience With Input-Output Pairs by Humans and Artificial Neural Networks , 2005 .

[21]  M. McDaniel,et al.  The conceptual basis of function learning and extrapolation: Comparison of rule-based and associative-based models , 2005, Psychonomic bulletin & review.

[22]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[23]  David A. Lagnado,et al.  Straight Choices: The Psychology of Decision Making , 2007 .

[24]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[25]  Yao-Chu Chiu,et al.  Is deck C an advantageous deck in the Iowa Gambling Task? , 2007, Behavioral and Brain Functions.

[26]  Nick Chater,et al.  The probabilistic mind , 2008 .

[27]  D. Shanks,et al.  Learning strategies in amnesia , 2008, Neuroscience & Biobehavioral Reviews.

[28]  Thomas L. Griffiths,et al.  Modeling human function learning with Gaussian processes , 2008, NIPS.

[29]  J. Kruschke Bayesian approaches to associative learning: From passive to active learning , 2008, Learning & behavior.

[30]  M. McDaniel,et al.  Predicting transfer performance: a comparison of competing function learning models. , 2009, Journal of experimental psychology. Learning, memory, and cognition.

[31]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[32]  R. Hertwig,et al.  The description–experience gap in risky choice , 2009, Trends in Cognitive Sciences.

[33]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[34]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[35]  M. Lee,et al.  A Bayesian analysis of human decision-making on bandit problems , 2009 .

[36]  David Ardia,et al.  DEoptim: An R Package for Global Optimization by Differential Evolution , 2009 .

[37]  D. Blei,et al.  Context, learning, and extinction. , 2010, Psychological review.

[38]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[39]  M. Zollo,et al.  The neuro-scientific foundations of the exploration-exploitation dilemma , 2010 .

[40]  D. Shanks,et al.  Learning in a changing environment. , 2010, Journal of experimental psychology. General.

[41]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[42]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[43]  Cleotilde González,et al.  Effects of feedback and complexity on repeated decisions from description , 2011 .

[44]  Samuel J. Gershman,et al.  A Tutorial on Bayesian Nonparametric Models , 2011, 1106.2697.

[45]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[46]  Jeffrey N. Rouder,et al.  Bayes factor approaches for testing interval null hypotheses. , 2011, Psychological methods.

[47]  E. Wagenmakers,et al.  A default Bayesian hypothesis test for correlations and partial correlations , 2012, Psychonomic bulletin & review.

[48]  S. Kakade,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2012, IEEE Transactions on Information Theory.

[49]  David S. Leslie,et al.  Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..

[50]  Karl J. Friston,et al.  Bayesian model selection for group studies — Revisited , 2014, NeuroImage.

[51]  Maarten Speekenbrink,et al.  Uncertainty and exploration in a restless bandit task , 2014, CogSci.

[52]  Joshua B. Tenenbaum,et al.  Assessing the Perceived Predictability of Functions , 2015, CogSci.

[53]  Maarten Speekenbrink,et al.  Exploration-Exploitation in a Contextual Multi-Armed Bandit Task , 2015 .

[54]  Samuel Gershman,et al.  Novelty and Inductive Generalization in Human Reinforcement Learning , 2015, Top. Cogn. Sci..

[55]  Maarten Speekenbrink,et al.  Uncertainty and Exploration in a Restless Bandit Problem , 2015, Top. Cogn. Sci..

[56]  Samuel Gershman,et al.  A Unifying Probabilistic View of Associative Learning , 2015, PLoS Comput. Biol..

[57]  Christopher G. Lucas,et al.  A rational model of function learning , 2015, Psychonomic Bulletin & Review.

[58]  Maarten Speekenbrink,et al.  Human behavior in contextual multi-armed bandit problems , 2015, CogSci.

[59]  Ben R. Newell,et al.  Unpacking the Exploration–Exploitation Tradeoff: A Synthesis of Human and Animal Literatures , 2015 .

[60]  Joshua B. Tenenbaum,et al.  Probing the Compositionality of Intuitive Functions , 2016, NIPS.

[61]  José Miguel Hernández-Lobato,et al.  Quantifying mismatch in Bayesian optimization , 2016, NIPS 2016.

[62]  S. Gershman Empirical priors for reinforcement learning models , 2016 .

[63]  M. Speekenbrink,et al.  Incorporating conflicting descriptions into decisions from experience , 2016 .

[64]  Andreas Krause,et al.  Better safe than sorry: Risky function exploitation through safe optimization , 2016, CogSci.

[65]  Samuel J. Gershman,et al.  Structured Representations of Utility in Combinatorial Domains , 2017 .