Restless Bandits with Constrained Arms: Applications in Social and Information Networks

We study a problem of information gathering in a social network with dynamically available sources and time varying quality of information. We formulate this problem as a restless multi-armed bandit (RMAB). In this problem, information quality of a source corresponds to the state of an arm in RMAB. The decision making agent does not know the quality of information from sources a priori. But the agent maintains a belief about the quality of information from each source. This is a problem of RMAB with partially observable states. The objective of the agent is to gather relevant information efficiently from sources by contacting them. We formulate this as a infinite horizon discounted reward problem, where reward depends on quality of information. We study Whittle's index policy which determines the sequence of play of arms that maximizes long term cumulative reward. We illustrate the performance of index policy, myopic policy and compare with uniform random policy through numerical simulation.

[1]  Yiangos Papanastasiou,et al.  Fake News Propagation and Detection: A Sequential Model , 2018, Manag. Sci..

[2]  Kaye D. Sweetser,et al.  An examination of the role of online social media in journalists’ source mix☆ , 2009 .

[3]  Vivek S. Borkar,et al.  Opportunistic Scheduling as Restless Bandits , 2017, IEEE Transactions on Control of Network Systems.

[4]  Liangfei Qiu,et al.  Information Acquisition and Exchange in Social Networks , 2012 .

[5]  Mingyan Liu,et al.  Optimality of Myopic Sensing in Multi-Channel Opportunistic Access , 2008, 2008 IEEE International Conference on Communications.

[6]  John N. Tsitsiklis,et al.  The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..

[7]  D. Manjunath,et al.  On the Whittle Index for Restless Multiarmed Hidden Markov Bandits , 2016, IEEE Transactions on Automatic Control.

[8]  Jie Yin,et al.  Using Social Media to Enhance Emergency Situation Awareness , 2012, IEEE Intelligent Systems.

[9]  John N. Tsitsiklis,et al.  On Learning With Finite Memory , 2012, IEEE Transactions on Information Theory.

[10]  Qing Zhao,et al.  Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access , 2008, IEEE Transactions on Information Theory.

[11]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Vol. II , 1976 .

[12]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .

[13]  Warrren B Powell,et al.  Index policies for discounted bandit problems with availability constraints , 2008, Advances in Applied Probability.

[14]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[15]  Rob Cross,et al.  A Relational View of Information Seeking and Learning in Social Networks , 2003, Manag. Sci..