Heterogeneous Explore-Exploit Strategies on Multi-Star Networks

We investigate the benefits of heterogeneity in multi-agent explore-exploit decision making where the goal of the agents is to maximize cumulative group reward. To do so we study a class of distributed stochastic bandit problems in which agents communicate over a multi-star network and make sequential choices among options in the same uncertain environment. Typically, in multi-agent bandit problems, agents use homogeneous decision-making strategies. However, group performance can be improved by incorporating heterogeneity into the choices agents make, especially when the network graph is irregular, i.e., when agents have different numbers of neighbors. We design and analyze new heterogeneous explore-exploit strategies, using the multi-star as the model irregular network graph. The key idea is to enable center agents to do more exploring than they would do using the homogeneous strategy, as a means of providing more useful data to the peripheral agents. In the case all agents broadcast their reward values and choices to their neighbors with the same probability, we provide theoretical guarantees that group performance improves under the proposed heterogeneous strategies as compared to under homogeneous strategies. We use numerical simulations to illustrate our results and to validate our theoretical bounds.

[1]  J. Deneubourg,et al.  From Social Network (Centralized vs. Decentralized) to Collective Decision-Making (Unshared vs. Shared Consensus) , 2012, PloS one.

[2]  Christos Dimitrakakis,et al.  Algorithms for Differentially Private Multi-Armed Bandits , 2015, AAAI.

[3]  Xiaogang Li,et al.  The influence of heterogeneous learning ability on the evolution of cooperation , 2019, Scientific Reports.

[4]  Alessandro Lazaric,et al.  Fighting Boredom in Recommender Systems with Linear Reinforcement Learning , 2018, NeurIPS.

[5]  Sanmay Das,et al.  Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits , 2017, IJCAI.

[6]  Naomi Ehrich Leonard,et al.  A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem , 2020, 2020 European Control Conference (ECC).

[7]  Vaibhav Srivastava,et al.  Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[8]  Varun Kanade,et al.  Decentralized Cooperative Stochastic Bandits , 2018, NeurIPS.

[9]  Vaibhav Srivastava,et al.  On distributed cooperative decision-making in multiarmed bandits , 2015, 2016 European Control Conference (ECC).

[10]  Naomi Ehrich Leonard,et al.  Distributed Learning: Sequential Decision Making in Resource-Constrained Environments , 2020, ArXiv.

[11]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[12]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[13]  Aditya Gopalan,et al.  Collaborative Learning of Stochastic Bandits Over a Social Network , 2018, IEEE/ACM Transactions on Networking.

[14]  Naomi Ehrich Leonard,et al.  Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits , 2020, ArXiv.

[15]  Matjaz Perc,et al.  Evolutionary mixed games in structured populations: Cooperation and the benefits of heterogeneity , 2016, Physical review. E.

[16]  Romain Laroche,et al.  Decentralized Exploration in Multi-Armed Bandits , 2018, ICML.

[17]  Naomi Ehrich Leonard,et al.  Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem , 2016, 2019 18th European Control Conference (ECC).

[18]  István Hegedüs,et al.  Gossip-based distributed stochastic bandit algorithms , 2013, ICML.

[19]  Vaibhav Srivastava,et al.  Social Imitation in Cooperative Multiarmed Bandits: Partition-Based Algorithms with Strictly Local Information , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[20]  Kaito Ariu,et al.  Optimal Algorithms for Multiplayer Multi-Armed Bandits , 2019, AISTATS.

[21]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .