论文信息 - Smooth Bandit Optimization: Generalization to Hölder Space - 字舞流文

Smooth Bandit Optimization: Generalization to Hölder Space

We consider bandit optimization of a smooth reward function, where the goal is cumulative regret minimization. This problem has been studied for $\alpha$-Holder continuous (including Lipschitz) functions with $0 1$ to bridge the gap between Lipschitz bandits and infinitely-differentiable models such as linear bandits. For Holder continuous functions, approaches based on random sampling in bins of a discretized domain suffices as optimal. In contrast, we propose a class of two-layer algorithms that deploy misspecified linear/polynomial bandit algorithms in bins. We demonstrate that the proposed algorithm can exploit higher-order smoothness of the function by deriving a regret upper bound of $\tilde{O}(T^\frac{d+\alpha}{d+2\alpha})$ for when $\alpha>1$, which matches existing lower bound. We also study adaptation to unknown function smoothness over a continuous scale of Holder spaces indexed by $\alpha$, with a bandit model selection approach applied with our proposed two-layer algorithms. We show that it achieves regret rate that matches the existing lower bound for adaptation within the $\alpha\leq 1$ subset.

Yusha Liu | Yining Wang | Aarti Singh | Yining Wang | Aarti Singh | Yusha Liu

[1] Xiaojie Mao,et al. Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes , 2019, COLT 2020.

[2] Julian Zimmert,et al. Model Selection in Contextual Stochastic Bandit Problems , 2020, NeurIPS.

[3] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[4] Jia Yuan Yu,et al. Lipschitz Bandits without the Lipschitz Constant , 2011, ALT.

[5] Raman Arora,et al. Corralling Stochastic Bandit Algorithms , 2020, AISTATS.

[6] G. E. Noether,et al. Nonparametric Confidence Intervals , 2006 .

[7] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[8] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[9] Stefan Wager,et al. Smoothness-Adaptive Contextual Bandits , 2019, Oper. Res..

[10] John Langford,et al. Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting , 2019, COLT.

[11] A. Yatchew,et al. Nonparametric Regression Techniques in Economics , 1998 .

[12] David S. Leslie,et al. On Thompson Sampling for Smoother-than-Lipschitz Bandits , 2020, AISTATS.

[13] Massimiliano Pontil,et al. Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits , 2020, NeurIPS.

[14] C. J. Stone,et al. Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[15] Csaba Szepesvári,et al. –armed Bandits , 2022 .

[16] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[17] Éva Tardos,et al. Learning in Games: Robustness of Fast Convergence , 2016, NIPS.

[18] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[19] Peter Auer,et al. Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[20] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[21] Haipeng Luo,et al. Corralling a Band of Bandit Algorithms , 2016, COLT.

[22] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .

[23] Haipeng Luo,et al. Model selection for contextual bandits , 2019, NeurIPS.

[24] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[25] Sivaraman Balakrishnan,et al. Optimization of Smooth Functions With Noisy Observations: Local Minimax Rates , 2018, IEEE Transactions on Information Theory.

[26] Alexandra Carpentier,et al. Adaptivity to Smoothness in X-armed bandits , 2018, COLT.

[27] Marc Hoffmann,et al. On adaptive inference and confidence bands , 2011, 1202.5145.

[28] Tor Lattimore,et al. Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2020, ICML.

[29] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..