Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions

We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021). We show that in adversarial regimes with a (∆, C, T ) self-bounding constraint the algorithm achieves O ( ( ∑ i6=i∗ 1 ∆i ) log+ ( (K−1)T

[1]  Renato Paes Leme,et al.  Stochastic bandits robust to adversarial corruptions , 2018, STOC.

[2]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[3]  Yevgeny Seldin,et al.  An Algorithm for Stochastic and Adversarial Bandits with Switching Costs , 2021, ICML.

[4]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[5]  Haipeng Luo,et al.  Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition , 2020, NeurIPS.

[6]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7]  Haipeng Luo,et al.  More Adaptive Algorithms for Adversarial Bandits , 2018, COLT.

[8]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[9]  Aleksandrs Slivkins,et al.  One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.

[10]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[11]  Ioannis Chatzigeorgiou,et al.  Bounds on the Lambert Function and Their Application to the Outage Analysis of User Cooperation , 2013, IEEE Communications Letters.

[12]  Julian Zimmert,et al.  Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits , 2018, J. Mach. Learn. Res..

[13]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[14]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[15]  Anupam Gupta,et al.  Better Algorithms for Stochastic Bandits with Adversarial Corruptions , 2019, COLT.

[16]  Tor Lattimore,et al.  Refining the Confidence Level for Optimistic Bandit Strategies , 2018, J. Mach. Learn. Res..

[17]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[18]  Yevgeny Seldin,et al.  Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits. , 2020, COLT 2020.

[19]  Gábor Lugosi,et al.  An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits , 2017, COLT.

[20]  Peter Auer,et al.  UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..

[21]  Peter Auer,et al.  An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits , 2016, COLT.

[22]  Julian Zimmert,et al.  Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously , 2019, ICML.

[23]  Jean-Yves Audibert,et al.  Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[24]  Aleksandrs Slivkins,et al.  25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .

[25]  Ambuj Tewari,et al.  Fighting Bandits with a New Kind of Smoothness , 2015, NIPS.