论文信息 - Networked Stochastic Multi-armed Bandits with Combinatorial Strategies - 字舞流文

Networked Stochastic Multi-armed Bandits with Combinatorial Strategies

In this paper, we investigate a largely extended version of classical MAB problem, called networked combinatorial bandit problems. In particular, we consider the setting of a decision maker over a networked bandits as follows: each time a combinatorial strategy, e.g., a group of arms, ischosen, and the decision maker receives a rewardresulting from her strategy and also receives a side bonusresulting from that strategy for each arm's neighbor. This is motivated by many real applications such as on-line social networks where friends can provide their feedback on shared content, therefore if we promote a product to a user, we can also collect feedback from her friends on that product. To this end, we consider two types of side bonus in this study: side observation and side reward. Upon the number of arms pulled at each time slot, we study two cases: single-play and combinatorial-play. Consequently, this leaves us four scenarios to investigate in the presence of side bonus: Single-play with Side Observation, Combinatorial-play with Side Observation, Single-play with Side Reward, and Combinatorial-play with Side Reward. For each case, we present and analyze a series of zero regret polices where the expect of regret over time approaches zero as time goes to infinity. Extensive simulations validate the effectiveness of our results.

Shaojie Tang | Yaqin Zhou | Weili Wu | Jing Yuan | Kai Han | Zhao Zhang | K. Han | Shaojie Tang | Weili Wu | Zhao Zhang | Jing Yuan | Yaqin Zhou

[1] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[2] Bhaskar Krishnamachari,et al. On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance , 2007, IEEE Transactions on Wireless Communications.

[3] Anna N. Sorokina,et al. Optimization of ads allocation in sponsored search , 2013, WWW '13 Companion.

[4] Cheng Soon Ong,et al. Multivariate spearman's ρ for aggregating ranks using copulas , 2016 .

[5] Mingyan Liu,et al. Online Learning in Decentralized Multiuser Resource Sharing Problems , 2012, ArXiv.

[6] Moshe Babaioff,et al. Dynamic Pricing with Limited Supply , 2011, ACM Trans. Economics and Comput..

[7] Alexandre Proutière,et al. Combinatorial Bandits Revisited , 2015, NIPS.

[8] Atilla Eryilmaz,et al. Multi-armed bandits in the presence of side observations in social networks , 2013, 52nd IEEE Conference on Decision and Control.

[9] Zheng Wen,et al. Matroid Bandits: Fast Combinatorial Optimization with Learning , 2014, UAI.

[10] Atilla Eryilmaz,et al. Stochastic bandits with side observations on networks , 2014, SIGMETRICS '14.

[11] Jure Leskovec,et al. Information diffusion and external influence in networks , 2012, KDD.

[12] Wei Chen,et al. Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[13] Yajun Wang,et al. Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms , 2014, J. Mach. Learn. Res..

[14] Wei Chen,et al. Combinatorial multi-armed bandit: general framework, results and applications , 2013, ICML 2013.

[15] Shaojie Tang,et al. Almost optimal accessing of nonstochastic channels in cognitive radio networks , 2012, 2012 Proceedings IEEE INFOCOM.

[16] J. Tsitsiklis,et al. Stochastic shortest path problems with recourse , 1996 .

[17] Wtt Wtt. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .

[18] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[19] Wei Chen,et al. Combinatorial Multi-Armed Bandit with General Reward Functions , 2016, NIPS.

[20] Marc Lelarge,et al. Leveraging Side Observations in Stochastic Bandits , 2012, UAI.

[21] Bhaskar Krishnamachari,et al. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[22] Shie Mannor,et al. From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.

[23] Zheng Wen,et al. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2014, AISTATS.

[24] Naumaan Nayyar,et al. Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.