论文信息 - Better safe than sorry: Risky function exploitation through safe optimization

Better safe than sorry: Risky function exploitation through safe optimization

Exploration-exploitation of functions, that is learning and optimizing a mapping between inputs and expected outputs, is ubiquitous to many real world situations. These situations sometimes require us to avoid certain outcomes at all cost, for example because they are poisonous, harmful, or otherwise dangerous. We test participants' behavior in scenarios in which they have to find the optimum of a function while at the same time avoid outputs below a certain threshold. In two experiments, we find that Safe-Optimization, a Gaussian Process-based exploration-exploitation algorithm, describes participants' behavior well and that participants seem to care firstly whether a point is safe and then try to pick the optimal point from all such safe points. This means that their trade-off between exploration and exploitation can be seen as an intelligent, approximate, and homeostasis-driven strategy.

Andreas Krause | Maarten Speekenbrink | Dominik R. Bach | Eric Schulz | Quentin J. M. Huys

[1] Maarten Speekenbrink,et al. Exploration-Exploitation in a Contextual Multi-Armed Bandit Task , 2015 .

[2] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[3] Alkis Gotovos,et al. Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[4] Andreas Krause,et al. Safe controller optimization for quadrotors with Gaussian processes , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[5] Joshua B. Tenenbaum,et al. Assessing the Perceived Predictability of Functions , 2015, CogSci.

[6] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7] Thomas L. Griffiths,et al. Modeling human function learning with Gaussian processes , 2008, NIPS.

[8] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[9] Maarten Speekenbrink,et al. Learning and decisions in contextual multi-armed bandit tasks , 2015, CogSci.

[10] Peter Dayan,et al. Interplay of approximate planning strategies , 2015, Proceedings of the National Academy of Sciences.

[11] Dominik R. Bach,et al. Anxiety-Like Behavioural Inhibition Is Normative under Environmental Threat-Reward Correlations , 2015, PLoS Comput. Biol..

[12] Bradley C. Love,et al. Active learning as a means to distinguish among prominent decision strategies , 2015, CogSci.

[13] Peter Dayan,et al. Bonsai Trees in Your Head: How the Pavlovian System Sculpts Goal-Directed Choices by Pruning Decision Trees , 2012, PLoS Comput. Biol..