Extreme Bandits Using Robust Statistics

We consider a multi-armed bandit problem motivated by situations where only the extreme values, as opposed to expected values in the classical bandit setting, are of interest. We propose distribution free algorithms using robust statistics and characterize the statistical properties. We show that the provided algorithms achieve vanishing extremal regret under weaker conditions than existing algorithms. Performance of the algorithms is demonstrated for the finite-sample setting using numerical experiments. The results show superior performance of the proposed algorithms compared to the well known algorithms. The work of Gennady Samorodnitsky was conducted as a consulting researcher at Baidu Research – Bellevue, WA.

[1]  R. Durrett Probability: Theory and Examples , 1993 .

[2]  Dominik D. Freydenberger,et al.  Can We Learn to Gamble Efficiently? , 2010, COLT.

[3]  Santiago Ontañón,et al.  The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games , 2013, AIIDE.

[4]  Aurélien Garivier,et al.  Max K-Armed Bandit: On the ExtremeHunter Algorithm and Beyond , 2017, ECML/PKDD.

[5]  Michal Valko,et al.  Extreme bandits , 2014, NIPS.

[6]  Stephen F. Smith,et al.  The Max K-Armed Bandit: A New Model of Exploration Applied to Search Heuristic Selection , 2005, AAAI.

[7]  Marcello Restelli,et al.  A Combinatorial-Bandit Algorithm for the Online Joint Bid/Budget Optimization of Pay-per-Click Advertising Campaigns , 2018, AAAI.

[8]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[9]  Warren B. Powell,et al.  The value of information in multi-armed bandits with exponentially distributed rewards , 2011, ICCS.

[10]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[11]  Nahum Shimkin,et al.  PAC Lower Bounds and Efficient Algorithms for The Max \(K\)-Armed Bandit Problem , 2016, ICML.

[12]  Ingo Klimant,et al.  A combinatorial approach for development of materials for optical sensing of gases. , 2004, Journal of combinatorial chemistry.

[13]  J. Pickands Statistical Inference Using Extreme Order Statistics , 1975 .

[14]  Nicolò Cesa-Bianchi,et al.  Bandits With Heavy Tail , 2012, IEEE Transactions on Information Theory.

[15]  Stephen F. Smith,et al.  A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem , 2006, CP.

[16]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[17]  Rémi Munos,et al.  Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.

[18]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[19]  Stephen F. Smith,et al.  An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem , 2006, AAAI.

[20]  L. Haan,et al.  Extreme value theory : an introduction , 2006 .

[21]  Vladimir Pozdnyakov,et al.  Scan Statistics: Methods and Applications , 2009 .

[22]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[23]  Doina Precup,et al.  Algorithms for multi-armed bandit problems , 2014, ArXiv.

[24]  Oliver Hinz,et al.  An analysis of the importance of the long tail in search engine marketing , 2010, Electron. Commer. Res. Appl..

[25]  Don R. Hush,et al.  A Classification Framework for Anomaly Detection , 2005, J. Mach. Learn. Res..

[26]  Tor Lattimore,et al.  Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits , 2015, COLT.

[27]  David Lopez-Paz,et al.  No Regret Bound for Extreme Bandits , 2015, AISTATS.

[28]  Alan H. Welsh,et al.  Best Attainable Rates of Convergence for Estimates of Parameters of Regular Variation , 1984 .

[29]  R. Fisher,et al.  Limiting forms of the frequency distribution of the largest or smallest member of a sample , 1928, Mathematical Proceedings of the Cambridge Philosophical Society.