Dominate or Delete: Decentralized Competing Bandits in Serial Dictatorship

Online learning in a two-sided matching market, with demand side agents continuously competing to be matched with supply side (arms), abstracts the complex interactions under partial information on matching platforms (e.g. UpWork, TaskRabbit). We study the decentralized serial dictatorship setting, a two-sided matching market where the demand side agents have unknown and heterogeneous valuation over the supply side (arms), while the arms have known uniform preference over the demand side (agents). We design the first decentralized algorithm – UCB with Decentralized Dominant-arm Deletion (UCB-D3), for the agents, that does not require any knowledge of reward gaps or time horizon. UCB-D3 works in phases, where in each phase, agents delete dominated arms – the arms preferred by higher ranked agents, and play only from the nondominated arms according to the UCB. At the end of the phase, agents broadcast in a decentralized fashion, their estimated preferred arms through pure exploitation. We prove both, a new regret lower bound for the decentralized serial dictatorship model, and that UCB-D3 is order optimal.

[1]  Shie Mannor,et al.  Multi-user lax communications: A multi-armed bandit approach , 2015, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[2]  Dana Ron,et al.  Scheduling with conflicts: online and offline algorithms , 2009, J. Sched..

[3]  D. Atkin OR scheduling algorithms. , 2000, Anesthesiology.

[4]  Vianney Perchet,et al.  Selfish Robustness and Equilibria in Multi-Player Bandits , 2020, COLT.

[5]  Naumaan Nayyar,et al.  Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[6]  Li Zhang,et al.  Information sharing in distributed stochastic bandits , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[7]  Laurent Massoulié,et al.  Adaptive matching for expert systems with uncertain task types , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[8]  A. Karpov A necessary and sufficient condition for uniqueness consistency in the stable marriage matching problem , 2019, Economics Letters.

[9]  Varun Kanade,et al.  Decentralized Cooperative Stochastic Bandits , 2018, NeurIPS.

[10]  Dominik D. Freydenberger,et al.  Can We Learn to Gamble Efficiently? , 2010, COLT.

[11]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[12]  Hervé Moulin,et al.  A New Solution to the Random Assignment Problem , 2001, J. Econ. Theory.

[13]  Kaito Ariu,et al.  Optimal Algorithms for Multiplayer Multi-Armed Bandits , 2019, AISTATS.

[14]  Lilian Besson,et al.  {Multi-Player Bandits Revisited} , 2017, ALT.

[15]  Michael I. Jordan,et al.  Competing Bandits in Matching Markets , 2019, AISTATS.

[16]  Sanmay Das,et al.  Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits , 2017, IJCAI.

[17]  A. Rees-Jones,et al.  An experimental investigation of preference misrepresentation in the residency match , 2018, Proceedings of the National Academy of Sciences.

[18]  Andreas Krause,et al.  Multi-Player Bandits: The Adversarial Case , 2019, J. Mach. Learn. Res..

[19]  Aditya Gopalan,et al.  Collaborative learning of stochastic bandits over a social network , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20]  L. S. Shapley,et al.  College Admissions and the Stability of Marriage , 2013, Am. Math. Mon..

[21]  Sanjay Shakkottai,et al.  The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits , 2020, AISTATS.

[22]  Hai Jiang,et al.  Medium access in cognitive radio networks: A competitive multi-armed bandit framework , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[23]  Brent Lance,et al.  Pareto Optimal Streaming Unsupervised Classification , 2019, ICML.

[24]  Shie Mannor,et al.  Concurrent Bandits and Cognitive Radio Networks , 2014, ECML/PKDD.

[25]  S. Clark,et al.  The Uniqueness of Stable Matchings , 2006 .

[26]  Sophie Bade,et al.  Random Serial Dictatorship: The One and Only , 2020, Math. Oper. Res..

[27]  Aravind Srinivasan,et al.  Online Resource Allocation with Matching Constraints , 2019, AAMAS.

[28]  Laurent Massoulié,et al.  On the capacity of information processing systems , 2016, COLT.

[29]  Amir Leshem,et al.  Distributed Multi-Player Bandits - a Game of Thrones Approach , 2018, NeurIPS.

[30]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[31]  Atila Abdulkadiroglu,et al.  RANDOM SERIAL DICTATORSHIP AND THE CORE FROM RANDOM ENDOWMENTS IN HOUSE ALLOCATION PROBLEMS , 1998 .

[32]  Vaibhav Srivastava,et al.  Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[33]  Vianney Perchet,et al.  SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits , 2018, NeurIPS.

[34]  Felix Brandt,et al.  The Computational Complexity of Random Serial Dictatorship , 2013, WINE.

[35]  Ohad Shamir,et al.  Multi-player bandits: a musical chairs approach , 2016, ICML 2016.

[36]  Sanjay Shakkottai,et al.  Social Learning in Multi Agent Multi Armed Bandits , 2019, Proc. ACM Meas. Anal. Comput. Syst..

[37]  Vianney Perchet,et al.  A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players , 2019, AISTATS.

[38]  Yashodhan Kanoria,et al.  Matching while Learning , 2016, EC.

[39]  Lin Zhou On a conjecture by gale about one-sided matching problems , 1990 .

[40]  Sanmay Das,et al.  Two-Sided Bandits and the Dating Market , 2005, IJCAI.