Generalization and search in risky environments

How do people pursue rewards in risky environments, where some outcomes should be avoided at all costs? We investigate participants’ search for spatially correlated rewards in scenarios where one must avoid sampling rewards below a given threshold. This requires participants to not only balance exploration and exploitation, but also to reason about how to avoid potentially risky areas of the search space. Within risky versions of the spatially correlated multi-armed bandit task, we show that participants’ behavior is aligned well with a Gaussian process function learning algorithm, which chooses points based on a safe optimization routine. Moreover, using leave-one-block-out cross-validation, we find that participants adapt their sampling behavior to the riskiness of the task, although the underlying function learning mechanism remains relatively unchanged. These results show that participants can adapt their search behavior to the adversity of the environment and enrich our understanding of adaptive behavior in the face of risk and uncertainty.

[1]  R. Shepard,et al.  Toward a universal law of generalization for psychological science. , 1987, Science.

[2]  A. Houston,et al.  General results concerning the trade-off between gaining energy and avoiding predation , 1993 .

[3]  Angela P. Schoellig,et al.  Safe and robust learning control with Gaussian processes , 2015, 2015 European Control Conference (ECC).

[4]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[5]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[6]  Dominik R. Bach,et al.  Heuristic and optimal policy computations in the human brain during sequential decision-making , 2018, Nature Communications.

[7]  M. Zollo,et al.  The neuro-scientific foundations of the exploration-exploitation dilemma , 2010 .

[8]  S. Kakade,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2012, IEEE Transactions on Information Theory.

[9]  Gregory L. Stuart,et al.  Evaluation of a behavioral measure of risk taking: the Balloon Analogue Risk Task (BART). , 2002, Journal of experimental psychology. Applied.

[10]  Samuel J. Gershman,et al.  Compositional inductive biases in function learning , 2016, Cognitive Psychology.

[11]  P. Chrisp Mapping The Unknown , 1996 .

[12]  E. Weber,et al.  Affective and deliberative processes in risky choice: age differences in risk taking in the Columbia Card Task. , 2009, Journal of experimental psychology. Learning, memory, and cognition.

[13]  M. Speekenbrink,et al.  Putting bandits into context: How function learning supports decision making , 2016, bioRxiv.

[14]  M. Frank,et al.  Computational psychiatry as a bridge from neuroscience to clinical applications , 2016, Nature Neuroscience.

[15]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[16]  Karl J. Friston,et al.  Computational psychiatry , 2012, Trends in Cognitive Sciences.

[17]  Thomas L. Griffiths,et al.  Rational Use of Cognitive Resources: Levels of Analysis Between the Computational and the Algorithmic , 2015, Top. Cogn. Sci..

[18]  Dominik R. Bach,et al.  Maintaining Homeostasis by Decision-Making , 2015, PLoS Comput. Biol..

[19]  Peter M. Todd,et al.  A Game of Hide and Seek: Expectations of Clumpy Resources Influence Hiding and Searching Patterns , 2015, PloS one.

[20]  J. Busemeyer,et al.  Learning Functional Relations Based on Experience With Input-Output Pairs by Humans and Artificial Neural Networks , 2005 .

[21]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[22]  Peter Dayan,et al.  Interplay of approximate planning strategies , 2015, Proceedings of the National Academy of Sciences.

[23]  Thomas L. Griffiths,et al.  Modeling human function learning with Gaussian processes , 2008, NIPS.

[24]  Jonathan D. Nelson,et al.  Generalization guides human exploration in vast decision spaces , 2017, Nature Human Behaviour.

[25]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[26]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[27]  M. Botvinick,et al.  Evidence integration in model-based tree search , 2015, Proceedings of the National Academy of Sciences.

[28]  Andreas Krause,et al.  A tutorial on Gaussian process regression with a focus on exploration-exploitation scenarios , 2016 .

[29]  Thomas T. Hills,et al.  Exploration versus exploitation in space, mind, and society , 2015, Trends in Cognitive Sciences.

[30]  M. Lee,et al.  A Bayesian analysis of human decision-making on bandit problems , 2009 .

[31]  Adam N. Sanborn,et al.  Bridging Levels of Analysis for Probabilistic Models of Cognition , 2012 .

[32]  Paul M. Krueger,et al.  Strategies for exploration in the domain of losses , 2017, Judgment and Decision Making.

[33]  David Ardia,et al.  DEoptim: An R Package for Global Optimization by Differential Evolution , 2009 .

[34]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[35]  Christopher G. Lucas,et al.  A rational model of function learning , 2015, Psychonomic Bulletin & Review.

[36]  Samuel J. Gershman,et al.  Structured Representations of Utility in Combinatorial Domains , 2017 .

[37]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[38]  Cliff Zou,et al.  A GAME OF “HIDE AND SEEK” , 2022 .

[39]  Dominik R. Bach,et al.  Anxiety-Like Behavioural Inhibition Is Normative under Environmental Threat-Reward Correlations , 2015, PLoS Comput. Biol..

[40]  N. Kolling,et al.  (Reinforcement?) Learning to forage optimally , 2017, Current Opinion in Neurobiology.

[41]  Naomi Ehrich Leonard,et al.  Parameter Estimation in Softmax Decision-Making Models With Linear Objective Functions , 2015, IEEE Transactions on Automation Science and Engineering.

[42]  Jonathan D. Nelson,et al.  Mapping the unknown: The spatially correlated multi-armed bandit , 2017, bioRxiv.

[43]  Vaibhav Srivastava,et al.  Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits , 2013, Proceedings of the IEEE.

[44]  Andreas Krause,et al.  Better safe than sorry: Risky function exploitation through safe optimization , 2016, CogSci.

[45]  P. Dayan,et al.  Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain , 2012, The Journal of Neuroscience.