Competing Bandits in Matching Markets

Stable matching, a classical model for two-sided markets, has long been studied with little consideration for how each side's preferences are learned. With the advent of massive online markets powered by data-driven matching platforms, it has become necessary to better understand the interplay between learning and market objectives. We propose a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to learn these from stochastic rewards. Our model extends the standard multi-armed bandits framework to multiple players, with the added feature that arms have preferences over players. We study both centralized and decentralized approaches to this problem and show surprising exploration-exploitation trade-offs compared to the single player multi-armed bandits setting.

[1]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2]  David A. Freedman,et al.  Machiavelli and the Gale-Shapley Algorithm , 1981 .

[3]  Alvin E. Roth,et al.  The Economics of Matching: Stability and Incentives , 1982, Math. Oper. Res..

[4]  A. Roth The Evolution of the Labor Market for Medical Interns and Residents: A Case Study in Game Theory , 1984, Journal of Political Economy.

[5]  Robert W. Irving,et al.  The Stable marriage problem - structure and algorithms , 1989, Foundations of computing series.

[6]  Alvin E. Roth,et al.  Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis , 1990 .

[7]  Uriel G. Rothblum,et al.  Paths to Marriage Stability , 1995, Discret. Appl. Math..

[8]  Atila Abdulkadiroglu,et al.  HOUSE ALLOCATION WITH EXISTING TENANTS , 1999 .

[9]  W. Gasarch,et al.  Stable Marriage and its Relation to Other Combinatorial Problems : An Introduction to Algorithm Analysis , 2002 .

[10]  Sanmay Das,et al.  Two-Sided Bandits and the Dating Market , 2005, IJCAI.

[11]  Alvin E. Roth,et al.  Pairwise Kidney Exchange , 2004, J. Econ. Theory.

[12]  Alvin E. Roth Deferred acceptance algorithms: history, theory, practice, and open questions , 2008, Int. J. Game Theory.

[13]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[14]  Ananthram Swami,et al.  Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[15]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[16]  L. S. Shapley,et al.  College Admissions and the Stability of Marriage , 2013, Am. Math. Mon..

[17]  Shie Mannor,et al.  Concurrent Bandits and Cognitive Radio Networks , 2014, ECML/PKDD.

[18]  Bruce M. Maggs,et al.  Algorithmic Nuggets in Content Delivery , 2015, CCRV.

[19]  Ohad Shamir,et al.  Multi-player bandits: a musical chairs approach , 2016, ICML 2016.

[20]  Claudio Gentile,et al.  Delay and Cooperation in Nonstochastic Bandits , 2016, COLT.

[21]  Shahin Shahrampour,et al.  Multi-armed bandits in multi-agent networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  M. Braverman,et al.  Communication Requirements and Informative Signaling in Matching Markets , 2017, EC.

[23]  Yashodhan Kanoria,et al.  Matching while Learning , 2016, EC.

[24]  Gábor Lugosi,et al.  Multiplayer bandits without observing collision information , 2018, Math. Oper. Res..

[25]  Yishay Mansour,et al.  Competing Bandits: Learning Under Competition , 2017, ITCS.

[26]  Amir Leshem,et al.  Distributed Multi-Player Bandits - a Game of Thrones Approach , 2018, NeurIPS.

[27]  Zhiwei Steven Wu,et al.  Competing Bandits: The Perils of Exploration under Competition , 2019, ArXiv.

[28]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[29]  Yuval Peres,et al.  Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without , 2020, COLT.

[30]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .