论文信息 - Bandits With Heavy Tail

Bandits With Heavy Tail

The stochastic multiarmed bandit problem is well understood when the reward distributions are sub-Gaussian. In this paper, we examine the bandit problem under the weaker assumption that the distributions have moments of order 1 + ε, for some ε ∈ (0,1]. Surprisingly, moments of order 2 (i.e., finite variance) are sufficient to obtain regret bounds of the same order as under sub-Gaussian reward distributions. In order to achieve such regret, we define sampling strategies based on refined estimators of the mean such as the truncated empirical mean, Catoni's M-estimator, and the median-of-means estimator. We also derive matching lower bounds that also show that the best achievable regret deteriorates when ε <; 1.

[1] P. Bickel. On Some Robust Estimates of Location , 1965 .

[2] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .

[3] Peter J. Huber,et al. Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[4] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[5] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[6] Noga Alon,et al. The space complexity of approximating the frequency moments , 1996, STOC '96.

[7] Richard M. Karp,et al. An Optimal Algorithm for Monte Carlo Estimation , 2000, SIAM J. Comput..

[8] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[9] Osamu Watanabe,et al. Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms , 1999, Data Mining and Knowledge Discovery.

[10] Bart Selman,et al. Heavy-Tailed Phenomena in Satisfiability and Constraint Satisfaction Problems , 2000, Journal of Automated Reasoning.

[11] B. Ripley,et al. Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[12] Andrew W. Moore,et al. The Racing Algorithm: Model Selection for Lazy Learners , 1997, Artificial Intelligence Review.

[13] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[14] Sébastien Bubeck. Bandits Games and Clustering Foundations , 2010 .

[15] O. Catoni. Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[16] Jürgen Schmidhuber,et al. Algorithm portfolio selection as a bandit problem with unbounded losses , 2011, Annals of Mathematics and Artificial Intelligence.

[17] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[18] Florin Ciucu,et al. Delay Bounds in Communication Networks With Heavy-Tailed and Self-Similar Traffic , 2009, IEEE Transactions on Information Theory.

[19] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[20] Sattar Vakili,et al. Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems , 2011, IEEE Journal of Selected Topics in Signal Processing.

[21] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .