Networked Stochastic Multi-armed Bandits with Combinatorial Strategies

In this paper, we investigate a largely extended version of classical MAB problem, called networked combinatorial bandit problems. In particular, we consider the setting of a decision maker over a networked bandits as follows: each time a combinatorial strategy, e.g., a group of arms, ischosen, and the decision maker receives a rewardresulting from her strategy and also receives a side bonusresulting from that strategy for each arm's neighbor. This is motivated by many real applications such as on-line social networks where friends can provide their feedback on shared content, therefore if we promote a product to a user, we can also collect feedback from her friends on that product. To this end, we consider two types of side bonus in this study: side observation and side reward. Upon the number of arms pulled at each time slot, we study two cases: single-play and combinatorial-play. Consequently, this leaves us four scenarios to investigate in the presence of side bonus: Single-play with Side Observation, Combinatorial-play with Side Observation, Single-play with Side Reward, and Combinatorial-play with Side Reward. For each case, we present and analyze a series of zero regret polices where the expect of regret over time approaches zero as time goes to infinity. Extensive simulations validate the effectiveness of our results.

[1]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[2]  Bhaskar Krishnamachari,et al.  On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance , 2007, IEEE Transactions on Wireless Communications.

[3]  Anna N. Sorokina,et al.  Optimization of ads allocation in sponsored search , 2013, WWW '13 Companion.

[4]  Cheng Soon Ong,et al.  Multivariate spearman's ρ for aggregating ranks using copulas , 2016 .

[5]  Mingyan Liu,et al.  Online Learning in Decentralized Multiuser Resource Sharing Problems , 2012, ArXiv.

[6]  Moshe Babaioff,et al.  Dynamic Pricing with Limited Supply , 2011, ACM Trans. Economics and Comput..

[7]  Alexandre Proutière,et al.  Combinatorial Bandits Revisited , 2015, NIPS.

[8]  Atilla Eryilmaz,et al.  Multi-armed bandits in the presence of side observations in social networks , 2013, 52nd IEEE Conference on Decision and Control.

[9]  Zheng Wen,et al.  Matroid Bandits: Fast Combinatorial Optimization with Learning , 2014, UAI.

[10]  Atilla Eryilmaz,et al.  Stochastic bandits with side observations on networks , 2014, SIGMETRICS '14.

[11]  Jure Leskovec,et al.  Information diffusion and external influence in networks , 2012, KDD.

[12]  Wei Chen,et al.  Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[13]  Yajun Wang,et al.  Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms , 2014, J. Mach. Learn. Res..

[14]  Wei Chen,et al.  Combinatorial multi-armed bandit: general framework, results and applications , 2013, ICML 2013.

[15]  Shaojie Tang,et al.  Almost optimal accessing of nonstochastic channels in cognitive radio networks , 2012, 2012 Proceedings IEEE INFOCOM.

[16]  J. Tsitsiklis,et al.  Stochastic shortest path problems with recourse , 1996 .

[17]  Wtt Wtt Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .

[18]  J. Walrand,et al.  Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[19]  Wei Chen,et al.  Combinatorial Multi-Armed Bandit with General Reward Functions , 2016, NIPS.

[20]  Marc Lelarge,et al.  Leveraging Side Observations in Stochastic Bandits , 2012, UAI.

[21]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[22]  Shie Mannor,et al.  From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.

[23]  Zheng Wen,et al.  Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2014, AISTATS.

[24]  Naumaan Nayyar,et al.  Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.