Smooth Bandit Optimization: Generalization to Hölder Space

We consider bandit optimization of a smooth reward function, where the goal is cumulative regret minimization. This problem has been studied for $\alpha$-Holder continuous (including Lipschitz) functions with $0 1$ to bridge the gap between Lipschitz bandits and infinitely-differentiable models such as linear bandits. For Holder continuous functions, approaches based on random sampling in bins of a discretized domain suffices as optimal. In contrast, we propose a class of two-layer algorithms that deploy misspecified linear/polynomial bandit algorithms in bins. We demonstrate that the proposed algorithm can exploit higher-order smoothness of the function by deriving a regret upper bound of $\tilde{O}(T^\frac{d+\alpha}{d+2\alpha})$ for when $\alpha>1$, which matches existing lower bound. We also study adaptation to unknown function smoothness over a continuous scale of Holder spaces indexed by $\alpha$, with a bandit model selection approach applied with our proposed two-layer algorithms. We show that it achieves regret rate that matches the existing lower bound for adaptation within the $\alpha\leq 1$ subset.

[1]  Xiaojie Mao,et al.  Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes , 2019, COLT 2020.

[2]  Julian Zimmert,et al.  Model Selection in Contextual Stochastic Bandit Problems , 2020, NeurIPS.

[3]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[4]  Jia Yuan Yu,et al.  Lipschitz Bandits without the Lipschitz Constant , 2011, ALT.

[5]  Raman Arora,et al.  Corralling Stochastic Bandit Algorithms , 2020, AISTATS.

[6]  G. E. Noether,et al.  Nonparametric Confidence Intervals , 2006 .

[7]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[8]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[9]  Stefan Wager,et al.  Smoothness-Adaptive Contextual Bandits , 2019, Oper. Res..

[10]  John Langford,et al.  Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting , 2019, COLT.

[11]  A. Yatchew,et al.  Nonparametric Regression Techniques in Economics , 1998 .

[12]  David S. Leslie,et al.  On Thompson Sampling for Smoother-than-Lipschitz Bandits , 2020, AISTATS.

[13]  Massimiliano Pontil,et al.  Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits , 2020, NeurIPS.

[14]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[15]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[16]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[17]  Éva Tardos,et al.  Learning in Games: Robustness of Fast Convergence , 2016, NIPS.

[18]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[19]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[20]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[21]  Haipeng Luo,et al.  Corralling a Band of Bandit Algorithms , 2016, COLT.

[22]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[23]  Haipeng Luo,et al.  Model selection for contextual bandits , 2019, NeurIPS.

[24]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[25]  Sivaraman Balakrishnan,et al.  Optimization of Smooth Functions With Noisy Observations: Local Minimax Rates , 2018, IEEE Transactions on Information Theory.

[26]  Alexandra Carpentier,et al.  Adaptivity to Smoothness in X-armed bandits , 2018, COLT.

[27]  Marc Hoffmann,et al.  On adaptive inference and confidence bands , 2011, 1202.5145.

[28]  Tor Lattimore,et al.  Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2020, ICML.

[29]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..