Learning Equilibria in Matching Markets from Bandit Feedback

Large-scale, two-sided matching platforms must find market outcomes that align with user preferences while simultaneously learning these preferences from data. But since preferences are inherently uncertain during learning, the classical notion of stability (Gale and Shapley, 1962; Shapley and Shubik, 1971) is unattainable in these settings. To bridge this gap, we develop a framework and algorithms for learning stable market outcomes under uncertainty. Our primary setting is matching with transferable utilities, where the platform both matches agents and sets monetary transfers between them. We design an incentive-aware learning objective that captures the distance of a market outcome from equilibrium. Using this objective, we analyze the complexity of learning as a function of preference structure, casting learning as a stochastic multi-armed bandit problem. Algorithmically, we show that “optimism in the face of uncertainty,” the principle underlying many bandit algorithms, applies to a primal-dual formulation of matching with transfers and leads to near-optimal regret bounds. Our work takes a first step toward elucidating when and how stable matchings arise in large, data-driven marketplaces.1

[1]  Devavrat Shah,et al.  Regret, stability, and fairness in matching markets with bandit learners , 2021, ArXiv.

[2]  Yinyu Ye,et al.  The Symmetry between Arms and Knapsacks: A Primal-Dual Approach for Bandits with Knapsacks , 2021, ICML.

[3]  M. Rothschild A two-armed bandit theory of market pricing , 1974 .

[4]  Yishay Mansour,et al.  Bayesian Incentive-Compatible Bandit Exploration , 2018 .

[5]  Wtt Wtt Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .

[6]  Michael I. Jordan,et al.  Competing Bandits in Matching Markets , 2019, AISTATS.

[7]  Alexandre Proutière,et al.  Combinatorial Bandits Revisited , 2015, NIPS.

[8]  Alessandro Lazaric,et al.  An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits , 2020, NeurIPS.

[9]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[10]  Jon M. Kleinberg,et al.  Incentivizing exploration , 2014, EC.

[11]  L. Shapley,et al.  QUASI-CORES IN A MONETARY ECONOMY WITH NONCONVEX PREFERENCES , 1966 .

[12]  Chandra R. Chegireddy,et al.  Algorithms for finding K-best perfect matchings , 1987, Discret. Appl. Math..

[13]  Martin Bichler,et al.  Walrasian equilibria from an optimization perspective: A guide to the literature , 2020, Naval Research Logistics (NRL).

[14]  Herbert E. Scarf,et al.  A LIMIT THEOREM ON THE CORE OF AN ECONOMY , 1963, Classics in Game Theory.

[15]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[16]  Karthik Abinav Sankararaman,et al.  Beyond log2(T) Regret for Decentralized Bandits in Matching Markets , 2021, ICML.

[17]  Itai Ashlagi,et al.  Communication Requirements and Informative Signaling in Matching Markets , 2017, EC.

[18]  Horia Mania,et al.  Bandit Learning in Decentralized Matching Markets , 2020, ArXiv.

[19]  David Kempe,et al.  The Complexity of Interactively Learning a Stable Matching by Trial and Error , 2020, EC.

[20]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[21]  Juan Enrique Martínez-Legaz,et al.  Dual representation of cooperative games based on fenchel-moreau conjugation , 1996 .

[22]  Benjamin Van Roy,et al.  Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.

[23]  Rafail Ostrovsky,et al.  A Stable Marriage Requires Communication , 2014, SODA.

[24]  Sanmay Das,et al.  Two-Sided Bandits and the Dating Market , 2005, IJCAI.

[25]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[26]  Nicole Immorlica,et al.  Adversarial Bandits with Knapsacks , 2018, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[27]  Karthik Abinav Sankararaman,et al.  Dominate or Delete: Decentralized Competing Bandits in Serial Dictatorship , 2021, AISTATS.

[28]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[29]  L. S. Shapley,et al.  College Admissions and the Stability of Marriage , 2013, Am. Math. Mon..

[30]  SangMok Lee,et al.  The Revealed Preference Theory of Stable and Extremal Stable Matchings , 2010 .

[31]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[32]  Zhiwei Steven Wu,et al.  Competing Bandits: The Perils of Exploration under Competition , 2019, ArXiv.

[33]  Yashodhan Kanoria,et al.  Matching while Learning , 2016, EC.

[34]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[35]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[36]  Max Alston,et al.  On the non-existence of stable matches with incomplete information , 2020, Games Econ. Behav..

[37]  Qingmin Liu,et al.  Stability and Bayesian Consistency in Two-Sided Markets , 2020, American Economic Review.

[38]  Larry Samuelson,et al.  Stable Matching with Incomplete Information (Second Version) , 2012 .

[39]  Sushil Bikhchandani,et al.  Stability with One-Sided Incomplete Information , 2017, J. Econ. Theory.

[40]  Eduardo M. Azevedo,et al.  Existence of Equilibrium in Large Matching Markets With Complementarities , 2018 .

[41]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[42]  Peng Shi,et al.  Efficient Matchmaking in Assignment Games with Application to Online Platforms , 2020, EC.

[43]  Wei Chen,et al.  Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.

[44]  L. Shapley,et al.  The assignment game I: The core , 1971 .