Collaborative Top Distribution Identifications with Limited Interaction (Extended Abstract)

We consider the following problem in this paper: given a set of $n$ distributions, find the top-$m$ ones with the largest means. This problem is also called top-$m$ arm identifications in the literature of reinforcement learning, and has numerous applications. We study the problem in the collaborative learning model where we have multiple agents who can draw samples from the $n$ distributions in parallel. Our goal is to characterize the tradeoffs between the running time of learning process and the number of rounds of interaction between agents, which is very expensive in various scenarios. We give optimal time-round tradeoffs, as well as demonstrate complexity separations between top-1 arm identification and top-$m$ arm identifications for general $m$ and between fixed-time and fixed-confidence variants. As a byproduct, we also give an algorithm for selecting the distribution with the $m$-th largest mean in the collaborative learning model.

[1]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[2]  Jian Li,et al.  Pure Exploration of Multi-armed Bandit Under Matroid Constraints , 2016, COLT.

[3]  Aurélien Garivier,et al.  Optimal Best Arm Identification with Fixed Confidence , 2016, COLT.

[4]  Varun Kanade,et al.  Distributed Non-Stochastic Experts , 2012, NIPS.

[5]  Vaibhav Srivastava,et al.  On distributed cooperative decision-making in multiarmed bandits , 2015, 2016 European Control Conference (ECC).

[6]  Jian Li,et al.  Towards Instance Optimal Bounds for Best Arm Identification , 2016, COLT.

[7]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[8]  Baruch Awerbuch,et al.  Competitive collaborative learning , 2005, J. Comput. Syst. Sci..

[9]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[10]  Mihaela van der Schaar,et al.  Distributed Multi-Agent Online Learning Based on Global Feedback , 2015, IEEE Transactions on Signal Processing.

[11]  Vianney Perchet,et al.  Batched Bandit Problems , 2015, COLT.

[12]  Jürgen Branke,et al.  Integrating Techniques from Statistical Ranking into Evolutionary Algorithms , 2006, EvoWorkshops.

[13]  Amin Karbasi,et al.  Batched Multi-Armed Bandits with Optimal Regret , 2019, ArXiv.

[14]  Yanjun Han,et al.  Batched Multi-armed Bandits Problem , 2019, NeurIPS.

[15]  Wei Chen,et al.  Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[16]  Yuan Zhou,et al.  Tight Bounds for Collaborative PAC Learning via Multiplicative Weights , 2018, NeurIPS.

[17]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[18]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[19]  David Silver,et al.  Concurrent Reinforcement Learning from Customer Interactions , 2013, ICML.

[20]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[21]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[22]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[23]  Claudio Gentile,et al.  Delay and Cooperation in Nonstochastic Bandits , 2016, COLT.

[24]  Ohad Shamir,et al.  Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.

[25]  Roi Livni,et al.  On Communication Complexity of Classification Problems , 2017, Electron. Colloquium Comput. Complex..

[26]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[27]  Yuan Zhou,et al.  Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-armed Bandits , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[28]  Arpit Agarwal,et al.  Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons , 2017, COLT.

[29]  A. Law,et al.  A procedure for selecting a subset of size m containing the l best of k independent normal populations, with applications to simulation , 1985 .

[30]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[31]  Jian Li,et al.  Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection , 2017, AISTATS.

[32]  Ittai Abraham,et al.  Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem , 2013, COLT.

[33]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[34]  Qin Zhang,et al.  Distributed and Streaming Linear Programming in Low Dimensions , 2019, PODS.

[35]  Sanmay Das,et al.  Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits , 2017, IJCAI.

[36]  Ohad Shamir,et al.  Multi-player bandits: a musical chairs approach , 2016, ICML 2016.

[37]  Avishek Saha,et al.  Efficient Protocols for Distributed Classification and Optimization , 2012, ALT.

[38]  István Hegedüs,et al.  Gossip-based distributed stochastic bandit algorithms , 2013, ICML.

[39]  Feng Ruan,et al.  Minimax Bounds on Stochastic Batched Convex Optimization , 2018, COLT.

[40]  Nicolò Cesa-Bianchi,et al.  Online Learning with Switching Costs and Other Adaptive Adversaries , 2013, NIPS.

[41]  Yu Bai,et al.  Provably Efficient Q-Learning with Low Switching Cost , 2019, NeurIPS.

[42]  The Communication Complexity of Optimization , 2020, SODA.

[43]  Enhong Chen,et al.  Efficient Pure Exploration in Adaptive Round model , 2019, NeurIPS.

[44]  Benjamin Van Roy,et al.  Coordinated Exploration in Concurrent Reinforcement Learning , 2018, ICML.

[45]  Eshcar Hillel,et al.  Distributed Exploration in Multi-Armed Bandits , 2013, NIPS.

[46]  Maria-Florina Balcan,et al.  Distributed Learning, Communication Complexity and Privacy , 2012, COLT.

[47]  Paul Valiant,et al.  Uncertainty about Uncertainty: Near-Optimal Adaptive Algorithms for Estimating Binary Mixtures of Unknown Coins , 2019, ArXiv.

[48]  Robert D. Nowak,et al.  Top Arm Identification in Multi-Armed Bandits with Batch Arm Pulls , 2016, AISTATS.

[49]  Ariel D. Procaccia,et al.  Collaborative PAC Learning , 2017, NIPS.

[50]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[51]  Amir Leshem,et al.  Distributed Multi-Player Bandits - a Game of Thrones Approach , 2018, NeurIPS.

[52]  Osamu Watanabe,et al.  Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms , 1999, Data Mining and Knowledge Discovery.

[53]  Emma Brunskill,et al.  Concurrent PAC RL , 2015, AAAI.

[54]  Alexandra Carpentier,et al.  Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem , 2016, COLT.

[55]  Sébastien Bubeck,et al.  Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[56]  Huy L. Nguyen,et al.  Improved Algorithms for Collaborative PAC Learning , 2018, NeurIPS.

[57]  Max Simchowitz,et al.  The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime , 2017, COLT.

[58]  Benjamin Van Roy,et al.  Scalable Coordinated Exploration in Concurrent Reinforcement Learning , 2018, NeurIPS.