Unimodal Thompson Sampling for Graph-Structured Arms

We study, to the best of our knowledge, the first Bayesian algorithm for unimodal Multi-Armed Bandit (MAB) problems with graph structure. In this setting, each arm corresponds to a node of a graph and each edge provides a relationship, unknown to the learner, between two nodes in terms of expected reward. Furthermore, for any node of the graph there is a path leading to the unique node providing the maximum expected reward, along which the expected reward is monotonically increasing. Previous results on this setting describe the behavior of frequentist MAB algorithms. In our paper, we design a Thompson Sampling-based algorithm whose asymptotic pseudo-regret matches the lower bound for the considered setting. We show that -as it happens in a wide number of scenarios- Bayesian MAB algorithms dramatically outperform frequentist ones. In particular, we provide a thorough experimental evaluation of the performance of our and state-of-the-art algorithms as the properties of the graph vary.

[1]  N. Gatti,et al.  Multi – Armed Bandit for Pricing , 2015 .

[2]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[3]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[4]  Nenghai Yu,et al.  Thompson Sampling for Budgeted Multi-Armed Bandits , 2015, IJCAI.

[5]  Alexandre Proutière,et al.  Unimodal Bandits without Smoothness , 2014, ArXiv.

[6]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[7]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[8]  Noga Alon,et al.  From Bandits to Experts: A Tale of Domination and Independence , 2013, NIPS.

[9]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[10]  Shie Mannor,et al.  Unimodal Bandits , 2011, ICML.

[11]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[12]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[13]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[14]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[15]  Benjamin Edelman,et al.  Strategic bidder behavior in sponsored search auctions , 2007, Decis. Support Syst..

[16]  Jon M. Kleinberg,et al.  Feedback effects between similarity and social influence in online communities , 2008, KDD.

[17]  Alexandre Proutière,et al.  Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms , 2014, ICML.

[18]  Shie Mannor,et al.  From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.

[19]  Rémi Munos,et al.  Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[20]  Rémi Munos,et al.  Spectral Bandits for Smooth Graph Functions , 2014, ICML.

[21]  Stéphane Caron,et al.  Mixing bandits: a recipe for improved cold-start recommendations in a social network , 2013, SNAKDD '13.