Social Imitation in Cooperative Multiarmed Bandits: Partition-Based Algorithms with Strictly Local Information

We study distributed cooperative decision-making in a multi-agent stochastic multi-armed bandit (MAB) problem in which agents are connected through an undirected graph and observe the actions and rewards of their neighbors. We develop a novel policy based on partitions of the communication graph and propose a distributed method for selecting an arbitrary number of leaders and partitions. We analyze this new policy and evaluate its performance using Monte-Carlo simulations.

[1]  P. Taylor,et al.  Test of optimal sampling by foraging great tits , 1978 .

[2]  Ananthram Swami,et al.  Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[3]  Vaibhav Srivastava,et al.  Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[4]  Shahin Shahrampour,et al.  Multi-armed bandits in multi-agent networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Vaibhav Srivastava,et al.  On distributed cooperative decision-making in multiarmed bandits , 2015, 2016 European Control Conference (ECC).

[6]  Atilla Eryilmaz,et al.  Multi-armed bandits in the presence of side observations in social networks , 2013, 52nd IEEE Conference on Decision and Control.

[7]  Hai Jiang,et al.  Medium access in cognitive radio networks: A competitive multi-armed bandit framework , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[8]  Naumaan Nayyar,et al.  Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[9]  J. Walrand,et al.  Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[10]  H. Roche,et al.  Why Copy Others? Insights from the Social Learning Strategies Tournament , 2010 .

[11]  Aditya Gopalan,et al.  Collaborative learning of stochastic bandits over a social network , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12]  Vaibhav Srivastava,et al.  Surveillance in an abruptly changing world via multiarmed bandits , 2014, 53rd IEEE Conference on Decision and Control.

[13]  K. Schlag Why Imitate, and If So, How?, : A Boundedly Rational Approach to Multi-armed Bandits , 1998 .

[14]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[15]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[16]  Vaibhav Srivastava,et al.  Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits , 2013, Proceedings of the IEEE.

[17]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[18]  Ulf Toelch,et al.  Humans copy rapidly increasing choices in a multiarmed bandit problem , 2010 .

[19]  Vaibhav Srivastava,et al.  On optimal foraging and multi-armed bandits , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20]  Yi Gai,et al.  Distributed Stochastic Online Learning Policies for Opportunistic Spectrum Access , 2014, IEEE Transactions on Signal Processing.