论文信息 - Improved Analysis of Robustness of the Tsallis-INF Algorithm to Adversarial Corruptions in Stochastic Multiarmed Bandits - 字舞流文

Improved Analysis of Robustness of the Tsallis-INF Algorithm to Adversarial Corruptions in Stochastic Multiarmed Bandits

We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021). In the adversarial regime with a self-bounding constraint and the stochastic regime with adversarial corruptions as its special case we improve the dependence on corruption magnitudeC. In particular, for C = Θ ( T log T ) , where T is the time horizon, we achieve an improvement by a multiplicative factor of √ log T log log T relative to the bound of Zimmert and Seldin (2021). We also improve the dependence of the regret bound on time horizon from logT to log (K−1)T ( ∑ i6=i∗ 1 ∆i ) , where K is the number of arms, ∆i are suboptimality gaps for suboptimal arms i, and i ∗ is the optimal arm. Additionally, we provide a general analysis, which allows to achieve the same kind of improvement for generalizations of Tsallis-INF to other settings beyond multiarmed bandits.

Yevgeny Seldin | Saeed Masoudian | Yevgeny Seldin | Saeed Masoudian

[1] Aleksandrs Slivkins,et al. One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.

[2] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[3] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[4] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[5] Renato Paes Leme,et al. Stochastic bandits robust to adversarial corruptions , 2018, STOC.

[6] Julian Zimmert,et al. Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously , 2019, ICML.

[7] Haipeng Luo,et al. More Adaptive Algorithms for Adversarial Bandits , 2018, COLT.

[8] C. Tsallis. Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[9] Julian Zimmert,et al. Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits , 2018, J. Mach. Learn. Res..

[10] Gábor Lugosi,et al. An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits , 2017, COLT.

[11] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[12] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[13] Aleksandrs Slivkins,et al. 25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .

[14] Ambuj Tewari,et al. Fighting Bandits with a New Kind of Smoothness , 2015, NIPS.

[15] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[16] Yevgeny Seldin,et al. Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits. , 2020, COLT 2020.

[17] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..

[18] Peter Auer,et al. An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits , 2016, COLT.