Robust Algorithms for Multiagent Bandits with Heavy Tails

We study the heavy-tailed stochastic bandit problem in the cooperative multiagent setting, where a group of agents interact with a common bandit problem, while communicating on a network with delays. Existing algorithms for the stochastic bandit in this setting utilize confidence intervals arising from an averaging-based communication protocol known as running consensus, that does not lend itself to robust estimation for heavy-tailed settings. We propose MP-UCB, a decentralized multi-agent algorithm for the cooperative stochastic bandit that incorporates robust estimation with a message-passing protocol. We prove optimal regret bounds for MP-UCB for several problem settings, and also demonstrate its superiority to existing methods. Furthermore, we establish the first lower bounds for the cooperative bandit problem, in addition to providing efficient algorithms for robust bandit estimation of location.

[1]  F. e. Calcul des Probabilités , 1889, Nature.

[2]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[3]  J. Walrand,et al.  Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[4]  Nathan Linial,et al.  Locality in Distributed Graph Algorithms , 1992, SIAM J. Comput..

[5]  Cassey Lee,et al.  Information Cascades in Multi-Agent Models , 1999 .

[6]  Mark A. McComb A Practical Guide to Heavy Tails , 2000, Technometrics.

[7]  W. Härdle,et al.  Statistical Tools for Finance and Insurance , 2003 .

[8]  Soundar R. T. Kumara,et al.  Survivability of multiagent-based supply networks: a topological perspect , 2004, IEEE Intelligent Systems.

[9]  Gennady Samorodnitsky,et al.  Variable heavy tails in Internet traffic , 2004, Perform. Evaluation.

[10]  Albert-László Barabási,et al.  The origin of bursts and heavy tails in human dynamics , 2005, Nature.

[11]  Kenneth E. Barner,et al.  Convergence of Consensus Models With Stochastic Disturbances , 2010, IEEE Transactions on Information Theory.

[12]  Qing Zhao,et al.  Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[14]  Qing Zhao,et al.  Decentralized multi-armed bandit with multiple distributed players , 2010, 2010 Information Theory and Applications Workshop (ITA).

[15]  O. Catoni Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[16]  J. Neuhaus,et al.  Prediction of Random Effects in Linear and Generalized Linear Models under Model Misspecification , 2011, Biometrics.

[17]  S. Janson Stable distributions , 2011, 1112.0220.

[18]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[19]  Sattar Vakili,et al.  Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems , 2011, IEEE Journal of Selected Topics in Signal Processing.

[20]  Nicolò Cesa-Bianchi,et al.  Bandits With Heavy Tail , 2012, IEEE Transactions on Information Theory.

[21]  Jukka Suomela,et al.  Survey of local algorithms , 2013, CSUR.

[22]  Claudio Gentile,et al.  A Gang of Bandits , 2013, NIPS.

[23]  Rémi Munos,et al.  Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.

[24]  Shuai Li,et al.  Online Clustering of Bandits , 2014, ICML.

[25]  Andrew Lucas,et al.  Ising formulations of many NP problems , 2013, Front. Physics.

[26]  Hang-Hyun Jo,et al.  Tail-scope: Using friends to estimate heavy tails of degree distributions in large-scale complex networks , 2014, Scientific Reports.

[27]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[28]  Nenghai Yu,et al.  Budgeted Multi-Armed Bandits with Multiple Plays , 2016, IJCAI.

[29]  Vaibhav Srivastava,et al.  Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[30]  Shuai Li,et al.  Collaborative Filtering Bandits , 2015, SIGIR.

[31]  Claudio Gentile,et al.  Delay and Cooperation in Nonstochastic Bandits , 2016, COLT.

[32]  Vaibhav Srivastava,et al.  On distributed cooperative decision-making in multiarmed bandits , 2015, 2016 European Control Conference (ECC).

[33]  Andres Muñoz Medina,et al.  No-Regret Algorithms for Heavy-Tailed Linear Bandits , 2016, ICML.

[34]  Shuai Li,et al.  Distributed Clustering of Linear Bandits in Peer to Peer Networks , 2016, ICML.

[35]  Shuai Li,et al.  On Context-Dependent Clustering of Bandits , 2016, ICML.

[36]  Michael R. Lyu,et al.  Pure Exploration of Multi-Armed Bandits with Heavy-Tailed Payoffs , 2018, UAI.

[37]  Michael R. Lyu,et al.  Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs , 2018, NeurIPS.

[38]  Vaibhav Srivastava,et al.  Social Imitation in Cooperative Multiarmed Bandits: Partition-Based Algorithms with Strictly Local Information , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[39]  Fred W. Glover,et al.  A Tutorial on Formulating QUBO Models , 2018, ArXiv.

[40]  Tor Lattimore,et al.  Contextual Bandits under Delayed Feedback , 2018, ArXiv.

[41]  Csaba Szepesvári,et al.  Bandits with Delayed, Aggregated Anonymous Feedback , 2017, ICML.

[42]  Varun Kanade,et al.  Decentralized Cooperative Stochastic Multi-armed Bandits , 2018, ArXiv.

[43]  Amir Leshem,et al.  Distributed Multi-Player Bandits - a Game of Thrones Approach , 2018, NeurIPS.

[44]  G. Lugosi,et al.  Robust multivariate mean estimation: The optimality of trimmed mean , 2019, The Annals of Statistics.

[45]  Thompson Sampling on Symmetric Alpha-Stable Bandits , 2019, IJCAI.

[46]  Yishay Mansour,et al.  Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits , 2019, NeurIPS.

[47]  Yuval Peres,et al.  Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without , 2020, COLT.

[48]  Simina Brânzei,et al.  Multiplayer Bandit Learning, from Competition to Cooperation , 2019, COLT.