论文信息 - Distributed Thompson Sampling

Distributed Thompson Sampling

We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with agent's ability to communicate, we note that communication may further reduce the upper bound of the regret for a distributed Thompson Sampling approach. To further improve the performance of distributed Thompson Sampling, we propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively. We analyse the algorithm under Bernoulli reward and derived a problem dependent upper bound on the cumulative regret.

Shaolei Ren | Linqi Song | Jing Dong | Tan Li

[1] Shipra Agrawal,et al. Near-Optimal Regret Bounds for Thompson Sampling , 2017, J. ACM.

[2] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[3] Thodoris Lykouris,et al. Graph regret bounds for Thompson Sampling and UCB , 2019, ALT.

[4] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[5] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.