Lenient Regret and Good-Action Identification in Gaussian Process Bandits

In this paper, we study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is “good enough”. On the theoretical side, we study various lenient regret notions in which all near-optimal actions incur zero penalty, and provide upper bounds on the lenient regret for GP-UCB and an elimination algorithm, circumventing the usual O( √ T ) term (with time horizon T ) resulting from zooming extremely close towards the function maximum. In addition, we complement these upper bounds with algorithmindependent lower bounds. On the practical side, we consider the problem of finding a single “good action” according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold. We experimentally find that such algorithms can often find a good action faster than standard optimization-based approaches.

[1]  Nicolas Vayatis,et al.  Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration , 2013, ECML/PKDD.

[2]  HennigPhilipp,et al.  Entropy search for information-efficient global optimization , 2012 .

[3]  Masashi Sugiyama,et al.  Good arm identification via bandit feedback , 2017, Machine Learning.

[4]  David Janz,et al.  Bandit optimisation of functions in the Matérn kernel RKHS , 2020, AISTATS.

[5]  Jonathan Scarlett,et al.  On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization , 2021, ICML.

[6]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[7]  Sattar Vakili,et al.  On Information Gain and Regret Bounds in Gaussian Process Bandits , 2020, AISTATS.

[8]  S. Rana,et al.  Bayesian Optimization with Monotonicity Information , 2017 .

[9]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[10]  Tara Javidi,et al.  Gaussian Process bandits with adaptive discretization , 2017, ArXiv.

[11]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[12]  Jonathan Scarlett,et al.  Tight Regret Bounds for Bayesian Optimization in One Dimension , 2018, ICML.

[13]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[14]  Alkis Gotovos,et al.  Active Learning for Level Set Estimation , 2022 .

[15]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[16]  Andreas Krause,et al.  Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation , 2016, NIPS.

[17]  Benjamin Van Roy,et al.  A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..

[18]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[19]  Benjamin Van Roy,et al.  Satisficing in Time-Sensitive Bandit Learning , 2018, Math. Oper. Res..

[20]  Kevin Jamieson,et al.  The True Sample Complexity of Identifying Good Arms , 2019, AISTATS.

[21]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[22]  Csaba Szepesvari,et al.  Online learning for linearly parametrized control problems , 2012 .

[23]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[24]  Tara Javidi,et al.  Multiscale Gaussian Process Level Set Estimation , 2019, AISTATS.

[25]  Larry A. Wasserman,et al.  Active Learning For Identifying Function Threshold Boundaries , 2005, NIPS.

[26]  Volkan Cevher,et al.  Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization , 2017, COLT.

[27]  Andreas Krause,et al.  Corruption-Tolerant Gaussian Process Bandit Optimization , 2020, AISTATS.

[28]  Aditya Gopalan,et al.  On Kernelized Multi-armed Bandits , 2017, ICML.

[29]  Tara Javidi,et al.  Multi-Scale Zero-Order Optimization of Smooth Functions in an RKHS , 2020, ArXiv.

[30]  Michael A. Osborne,et al.  Knowing The What But Not The Where in Bayesian Optimization , 2019, ICML.

[31]  Shie Mannor,et al.  Lenient Regret for Multi-Armed Bandits , 2020, ArXiv.

[32]  Michèle Sebag,et al.  Machine Learning and Knowledge Discovery in Databases , 2015, Lecture Notes in Computer Science.

[33]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[34]  John Shawe-Taylor,et al.  Regret Bounds for Gaussian Process Bandit Problems , 2010, AISTATS 2010.

[35]  Nello Cristianini,et al.  Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.

[36]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[37]  Zi Wang,et al.  Max-value Entropy Search for Efficient Bayesian Optimization , 2017, ICML.