Bandit Learning in Decentralized Matching Markets

We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience. Also, we assume the players have no direct means of communication. This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player setting with competition. We introduce a new algorithm for this setting that, over a time horizon $T$, attains $\mathcal{O}(\log(T))$ stable regret when preferences of the arms over players are shared, and $\mathcal{O}(\log(T)^2)$ regret when there are no assumptions on the preferences on either side.

[1]  Yishay Mansour,et al.  Competing Bandits: Learning Under Competition , 2017, ITCS.

[2]  Michael I. Jordan,et al.  Competing Bandits in Matching Markets , 2019, AISTATS.

[3]  Claudio Gentile,et al.  Delay and Cooperation in Nonstochastic Bandits , 2016, COLT.

[4]  Vianney Perchet,et al.  Selfish Robustness and Equilibria in Multi-Player Bandits , 2020, COLT.

[5]  Gábor Lugosi,et al.  Multiplayer bandits without observing collision information , 2018, Math. Oper. Res..

[6]  Mark Sellke,et al.  Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal Regret With Neither Communication Nor Collisions , 2020, COLT.

[7]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[8]  Zhiwei Steven Wu,et al.  Competing Bandits: The Perils of Exploration under Competition , 2019, ArXiv.

[9]  Sumit J Darak,et al.  Multi-Player Multi-Armed Bandits for Stable Allocation in Heterogeneous Ad-Hoc Networks , 2018, IEEE Journal on Selected Areas in Communications.

[10]  M. Braverman,et al.  Communication Requirements and Informative Signaling in Matching Markets , 2017, EC.

[11]  Alvin E. Roth,et al.  Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis , 1990 .

[12]  Mohammad Akbarpour,et al.  Thickness and Information in Dynamic Matching Markets , 2018, Journal of Political Economy.

[13]  Shie Mannor,et al.  Multi-user lax communications: A multi-armed bandit approach , 2015, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[14]  Sanmay Das,et al.  Two-Sided Bandits and the Dating Market , 2005, IJCAI.

[15]  A. Roth,et al.  Random paths to stability in two-sided matching , 1990 .

[16]  Nicholas Bambos,et al.  My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits , 2020, ICML.

[17]  Ramamohan Paturi,et al.  Jealousy Graphs: Structure and Complexity of Decentralized Stable Matching , 2013, WINE.

[18]  Ananthram Swami,et al.  Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[19]  Amir Leshem,et al.  Distributed Multi-Player Bandits - a Game of Thrones Approach , 2018, NeurIPS.

[20]  Shahin Shahrampour,et al.  Multi-armed bandits in multi-agent networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Michael I. Jordan,et al.  Learning Strategies in Decentralized Matching Markets under Uncertain Preferences , 2020, ArXiv.

[22]  Itai Ashlagi,et al.  Assortment planning for two-sided sequential matching markets , 2019, ArXiv.

[23]  W. Gasarch,et al.  Stable Marriage and its Relation to Other Combinatorial Problems : An Introduction to Algorithm Analysis , 2002 .

[24]  Naumaan Nayyar,et al.  Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[25]  Shie Mannor,et al.  Concurrent Bandits and Cognitive Radio Networks , 2014, ECML/PKDD.

[26]  Ohad Shamir,et al.  Multi-player bandits: a musical chairs approach , 2016, ICML 2016.

[27]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[28]  Yashodhan Kanoria,et al.  Matching while Learning , 2016, EC.

[29]  L. S. Shapley,et al.  College Admissions and the Stability of Marriage , 2013, Am. Math. Mon..

[30]  Yuval Peres,et al.  Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without , 2020, COLT.

[31]  Uriel G. Rothblum,et al.  Paths to Marriage Stability , 1995, Discret. Appl. Math..

[32]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[33]  Karthik Abinav Sankararaman,et al.  Dominate or Delete: Decentralized Competing Bandits with Uniform Valuation , 2020, ArXiv.