论文信息 - Multi-armed bandits in the presence of side observations in social networks

Multi-armed bandits in the presence of side observations in social networks

We consider the decision problem of an external agent choosing to execute one of M actions for each user in a social network. We assume that observing a user's actions provides valuable information for a larger set of users since each user's preferences are interrelated with those of her social peers. This falls into the well-known setting of the multi-armed bandit (MAB) problems, but with the critical new component of side observations resulting from interactions between users. Our contributions in this work are as follows: 1) We model the MAB problem in the presence of side observations and obtain an asymptotic lower bound (as a function of the network structure) on the regret (loss) of any uniformly good policy that achieves the maximum long term average reward. 2) We propose a randomized policy that explores actions for each user at a rate that is a function of her network position. We show that this policy achieves the asymptotic lower bound on regret associated with actions that are unpopular for all the users. 3) We derive an upper bound on the regret of existing Upper Confidence Bound (UCB) policies for MAB problems modified for our setting of side observations. We present case studies to show that these UCB policies are agnostic of the network structure and this causes their regret to suffer in a network setting. Our investigations in this work reveal the significant gains that can be obtained even through static network-aware policies.

[1] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[2] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[3] M. Jackson,et al. An Economic Model of Friendship: Homophily, Minorities and Segregation , 2007 .

[4] Shie Mannor,et al. From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.

[5] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[6] D. Kandel. Homophily, Selection, and Socialization in Adolescent Friendships , 1978, American Journal of Sociology.

[7] Marc Lelarge,et al. Leveraging Side Observations in Stochastic Bandits , 2012, UAI.

[8] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[9] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[10] Albert,et al. Emergence of scaling in random networks , 1999, Science.