Generalization and search in risky environments

How do people pursue rewards in risky environments, where some outcomes should be avoided at all costs? We investigate how participant search for spatially correlated rewards in scenarios where one must avoid sampling rewards below a given threshold. This requires not only the balancing of exploration and exploitation, but also reasoning about how to avoid potentially risky areas of the search space. Within risky versions of the spatially correlated multi-armed bandit task, we show that participants’ behavior is aligned well with a Gaussian process function learning algorithm, which chooses points based on a safe optimization routine. Moreover, using leave-one-block-out cross-validation, we find that participants adapt their sampling behavior to the riskiness of the task, although the underlying function learning mechanism remains relatively unchanged. These results show that participants can adapt their search behavior to the adversity of the environment and enrich our understanding of adaptive behavior in the face of risk and uncertainty.

[1]  Dominik R. Bach,et al.  Heuristic and optimal policy computations in the human brain during sequential decision-making , 2018, Nature Communications.

[2]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[3]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[4]  Angela P. Schoellig,et al.  Safe and robust learning control with Gaussian processes , 2015, 2015 European Control Conference (ECC).

[5]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[6]  Dominik R. Bach,et al.  Anxiety-Like Behavioural Inhibition Is Normative under Environmental Threat-Reward Correlations , 2015, PLoS Comput. Biol..

[7]  Jonathan D. Nelson,et al.  Mapping the unknown: The spatially correlated multi-armed bandit , 2017, bioRxiv.

[8]  R. Shepard,et al.  Toward a universal law of generalization for psychological science. , 1987, Science.

[9]  Thomas L. Griffiths,et al.  Rational Use of Cognitive Resources: Levels of Analysis Between the Computational and the Algorithmic , 2015, Top. Cogn. Sci..

[10]  Peter M. Todd,et al.  A Game of Hide and Seek: Expectations of Clumpy Resources Influence Hiding and Searching Patterns , 2015, PloS one.

[11]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[12]  Thomas L. Griffiths,et al.  Modeling human function learning with Gaussian processes , 2008, NIPS.

[13]  Vaibhav Srivastava,et al.  Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits , 2013, Proceedings of the IEEE.

[14]  E. Weber,et al.  Affective and deliberative processes in risky choice: age differences in risk taking in the Columbia Card Task. , 2009, Journal of experimental psychology. Learning, memory, and cognition.

[15]  Gregory L. Stuart,et al.  Evaluation of a behavioral measure of risk taking: the Balloon Analogue Risk Task (BART). , 2002, Journal of experimental psychology. Applied.

[16]  David Ardia,et al.  DEoptim: An R Package for Global Optimization by Differential Evolution , 2009 .

[17]  Christopher G. Lucas,et al.  A rational model of function learning , 2015, Psychonomic Bulletin & Review.

[18]  Dominik R. Bach,et al.  Maintaining Homeostasis by Decision-Making , 2015, PLoS Comput. Biol..

[19]  J. Busemeyer,et al.  Learning Functional Relations Based on Experience With Input-Output Pairs by Humans and Artificial Neural Networks , 2005 .

[20]  M. Speekenbrink,et al.  Putting bandits into context: How function learning supports decision making , 2016, bioRxiv.

[21]  M. Lee,et al.  A Bayesian analysis of human decision-making on bandit problems , 2009 .

[22]  Jonathan D. Nelson,et al.  Exploration and generalization in vast spaces 1 , 2017 .

[23]  Karl J. Friston,et al.  Computational psychiatry , 2012, Trends in Cognitive Sciences.

[24]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[25]  P. Dayan,et al.  Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain , 2012, The Journal of Neuroscience.

[26]  Paul M. Krueger,et al.  Strategies for exploration in the domain of losses , 2017, Judgment and Decision Making.

[27]  A. Houston,et al.  General results concerning the trade-off between gaining energy and avoiding predation , 1993 .

[28]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[29]  N. Kolling,et al.  (Reinforcement?) Learning to forage optimally , 2017, Current Opinion in Neurobiology.

[30]  H. Barlow Vision: A computational investigation into the human representation and processing of visual information: David Marr. San Francisco: W. H. Freeman, 1982. pp. xvi + 397 , 1983 .

[31]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[32]  M. Zollo,et al.  The neuro-scientific foundations of the exploration-exploitation dilemma , 2010 .

[33]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[34]  Andreas Krause,et al.  Better safe than sorry: Risky function exploitation through safe optimization , 2016, CogSci.

[35]  Samuel J. Gershman,et al.  Compositional inductive biases in function learning , 2016, Cognitive Psychology.

[36]  Naomi Ehrich Leonard,et al.  Parameter Estimation in Softmax Decision-Making Models With Linear Objective Functions , 2015, IEEE Transactions on Automation Science and Engineering.

[37]  Thomas T. Hills,et al.  Exploration versus exploitation in space, mind, and society , 2015, Trends in Cognitive Sciences.

[38]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[39]  M. Botvinick,et al.  Evidence integration in model-based tree search , 2015, Proceedings of the National Academy of Sciences.

[40]  Peter Dayan,et al.  Interplay of approximate planning strategies , 2015, Proceedings of the National Academy of Sciences.

[41]  M. Frank,et al.  Computational psychiatry as a bridge from neuroscience to clinical applications , 2016, Nature Neuroscience.

[42]  Adam N. Sanborn,et al.  Bridging Levels of Analysis for Probabilistic Models of Cognition , 2012 .

[43]  Andreas Krause,et al.  A tutorial on Gaussian process regression with a focus on exploration-exploitation scenarios , 2016 .

[44]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[45]  Samuel J. Gershman,et al.  Structured Representations of Utility in Combinatorial Domains , 2017 .