论文信息 - Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes

Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes

We consider the variant of the stochastic multi-armed bandit problem where the stochastic reward distributions may change abruptly several times. In contrast to previous work, we are able to achieve (nearly) optimal mini-max regret bounds without knowing the number of changes. For this setting, we propose an algorithm called ADSWITCH and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. Our regret bound is the first optimal bound for an algorithm that is not tuned with respect to the number of changes.

[1] David Simchi-Levi,et al. Learning to Optimize under Non-Stationarity , 2018, AISTATS.

[2] Haipeng Luo,et al. Efficient Contextual Bandits in Non-stationary Worlds , 2017, COLT.

[3] Peter Auer,et al. An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits , 2016, COLT.

[4] Michèle Sebag,et al. Multi-armed Bandit, Dynamic Environments and Meta-Bandits , 2006 .

[5] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[6] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[7] Aleksandrs Slivkins,et al. One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.

[8] Haipeng Luo,et al. A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free , 2019, COLT.

[9] Aleksandrs Slivkins,et al. 25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .

[10] Eli Upfal,et al. Adapting to a Changing Environment: the Brownian Restless Bandits , 2008, COLT.

[11] Shie Mannor,et al. Piecewise-stationary bandit problems with side observations , 2009, ICML '09.

[12] Eric Moulines,et al. On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.

[13] Raphaël Féraud,et al. The non-stationary stochastic multi-armed bandit problem , 2017, International Journal of Data Science and Analytics.

[14] Peter Auer,et al. Regret bounds for restless Markov bandits , 2012, Theor. Comput. Sci..

[15] A. S. Xanthopoulos,et al. Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems , 2008, Appl. Math. Comput..

[16] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[17] P. Auer,et al. Adaptively Tracking the Best Arm with an Unknown Number of Distribution Changes , 2018 .

[18] Lilian Besson,et al. What Doubling Tricks Can and Can't Do for Multi-Armed Bandits , 2018, ArXiv.

[19] Omar Besbes,et al. Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-Stationary Rewards , 2014, Stochastic Systems.