论文信息 - Lenient Regret and Good-Action Identification in Gaussian Process Bandits - 字舞流文

Lenient Regret and Good-Action Identification in Gaussian Process Bandits

In this paper, we study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is “good enough”. On the theoretical side, we study various lenient regret notions in which all near-optimal actions incur zero penalty, and provide upper bounds on the lenient regret for GP-UCB and an elimination algorithm, circumventing the usual O( √ T ) term (with time horizon T ) resulting from zooming extremely close towards the function maximum. In addition, we complement these upper bounds with algorithmindependent lower bounds. On the practical side, we consider the problem of finding a single “good action” according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold. We experimentally find that such algorithms can often find a good action faster than standard optimization-based approaches.

Jonathan Scarlett | Selwyn Gomes | Xu Cai | J. Scarlett | Xu Cai | S. Gomes

[1] Nicolas Vayatis,et al. Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration , 2013, ECML/PKDD.

[2] HennigPhilipp,et al. Entropy search for information-efficient global optimization , 2012 .

[3] Masashi Sugiyama,et al. Good arm identification via bandit feedback , 2017, Machine Learning.

[4] David Janz,et al. Bandit optimisation of functions in the Matérn kernel RKHS , 2020, AISTATS.

[5] Jonathan Scarlett,et al. On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization , 2021, ICML.

[6] Matthew W. Hoffman,et al. Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[7] Sattar Vakili,et al. On Information Gain and Regret Bounds in Gaussian Process Bandits , 2020, AISTATS.

[8] S. Rana,et al. Bayesian Optimization with Monotonicity Information , 2017 .

[9] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[10] Tara Javidi,et al. Gaussian Process bandits with adaptive discretization , 2017, ArXiv.

[11] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[12] Jonathan Scarlett,et al. Tight Regret Bounds for Bayesian Optimization in One Dimension , 2018, ICML.

[13] Alkis Gotovos,et al. Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[14] Alkis Gotovos,et al. Active Learning for Level Set Estimation , 2022 .

[15] Adam D. Bull,et al. Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[16] Andreas Krause,et al. Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation , 2016, NIPS.

[17] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..

[18] Philipp Hennig,et al. Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[19] Benjamin Van Roy,et al. Satisficing in Time-Sensitive Bandit Learning , 2018, Math. Oper. Res..

[20] Kevin Jamieson,et al. The True Sample Complexity of Identifying Good Arms , 2019, AISTATS.

[21] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[22] Csaba Szepesvari,et al. Online learning for linearly parametrized control problems , 2012 .

[23] Harold J. Kushner,et al. A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[24] Tara Javidi,et al. Multiscale Gaussian Process Level Set Estimation , 2019, AISTATS.

[25] Larry A. Wasserman,et al. Active Learning For Identifying Function Threshold Boundaries , 2005, NIPS.

[26] Volkan Cevher,et al. Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization , 2017, COLT.

[27] Andreas Krause,et al. Corruption-Tolerant Gaussian Process Bandit Optimization , 2020, AISTATS.

[28] Aditya Gopalan,et al. On Kernelized Multi-armed Bandits , 2017, ICML.

[29] Tara Javidi,et al. Multi-Scale Zero-Order Optimization of Smooth Functions in an RKHS , 2020, ArXiv.

[30] Michael A. Osborne,et al. Knowing The What But Not The Where in Bayesian Optimization , 2019, ICML.

[31] Shie Mannor,et al. Lenient Regret for Multi-Armed Bandits , 2020, ArXiv.

[32] Michèle Sebag,et al. Machine Learning and Knowledge Discovery in Databases , 2015, Lecture Notes in Computer Science.

[33] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .

[34] John Shawe-Taylor,et al. Regret Bounds for Gaussian Process Bandit Problems , 2010, AISTATS 2010.

[35] Nello Cristianini,et al. Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.

[36] Nando de Freitas,et al. Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[37] Zi Wang,et al. Max-value Entropy Search for Efficient Bayesian Optimization , 2017, ICML.