Mechanism Design with Bandit Feedback

We study a multi-round welfare-maximising mechanism design problem in instances where agents do not know their values. On each round, a mechanism assigns an allocation each to a set of agents and charges them a price; then the agents provide (stochastic) feedback to the mechanism for the allocation they received. This is motivated by applications in cloud markets and online advertising where an agent may know her value for an allocation only after experiencing it. Therefore, the mechanism needs to explore different allocations for each agent, while simultaneously attempting to find the socially optimal set of allocations. Our focus is on truthful and individually rational mechanisms which imitate the classical VCG mechanism in the long run. To that end, we define three notions of regret for the welfare, the individual utilities of each agent and that of the mechanism. We show that these three terms are interdependent via an $\Omega(T^{2/3})$ lower bound for the maximum of these three terms after $T$ rounds of allocations, and describe a family of anytime algorithms which achieve this rate. Our framework provides flexibility to control the pricing scheme so as to trade-off between the agent and seller regrets, and additionally to control the degree of truthfulness and individual rationality.

[1]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2]  William Vickrey,et al.  Counterspeculation, Auctions, And Competitive Sealed Tenders , 1961 .

[3]  E. H. Clarke Multipart pricing of public goods , 1971 .

[4]  D. J. Roberts,et al.  THE INCENTIVES FOR PRICE-TAKING BEHAVIOR IN LARGE EXCHANGE ECONOMIES , 1976 .

[5]  T. Groves,et al.  Efficient Collective Choice when Compensation is Possible , 1979 .

[6]  Roger B. Myerson,et al.  Optimal Auction Design , 1981, Math. Oper. Res..

[7]  J. Rochet A necessary and sufficient condition for rationalizability in a quasi-linear context , 1987 .

[8]  Éva Tardos,et al.  Truthful mechanisms for one-parameter agents , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[9]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[10]  Aranyak Mehta,et al.  Playing large games using simple strategies , 2003, EC '03.

[11]  James Schummer,et al.  Almost-dominant strategy implementation: exchange economies , 2004, Games Econ. Behav..

[12]  Aranyak Mehta,et al.  AdWords and generalized on-line matching , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[13]  Ashish Goel,et al.  Truthful auctions for pricing search keywords , 2006, EC '06.

[14]  Aranyak Mehta,et al.  A Note on Approximate Nash Equilibria , 2006, WINE.

[15]  D. Bergemann,et al.  Efficient Dynamic Auctions , 2006 .

[16]  Amin Saberi,et al.  Approximating nash equilibria using small-support strategies , 2007, EC '07.

[17]  Ilya Segal,et al.  An Efficient Dynamic Mechanism , 2013 .

[18]  R. Vohra,et al.  Algorithmic Game Theory: Sponsored Search Auctions , 2007 .

[19]  Ron Lavi,et al.  Algorithmic Mechanism Design , 2008, Encyclopedia of Algorithms.

[20]  Maria-Florina Balcan,et al.  Reducing mechanism design to algorithm design via machine learning , 2007, J. Comput. Syst. Sci..

[21]  Amin Saberi,et al.  Dynamic cost-per-action mechanisms and applications to online advertising , 2008, WWW.

[22]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[23]  P. Cramton Spectrum Auction Design , 2009 .

[24]  Nikhil R. Devanur,et al.  The price of truthfulness for pay-per-click auctions , 2009, EC '09.

[25]  Moshe Babaioff,et al.  Truthful mechanisms with implicit payment computation , 2010, EC '10.

[26]  Sham M. Kakade,et al.  An Optimal Dynamic Mechanism for Multi-Armed Bandit Processes , 2010, ArXiv.

[27]  Fuhito Kojima,et al.  Incentives in the probabilistic serial mechanism , 2010, J. Econ. Theory.

[28]  Vianney Perchet,et al.  The multi-armed bandit problem with covariates , 2011, ArXiv.

[29]  Alessandro Lazaric,et al.  A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities , 2012, EC '12.

[30]  Umar Syed,et al.  Learning Prices for Repeated Auctions with Strategic Buyers , 2013, NIPS.

[31]  Vianney Perchet,et al.  Bounded regret in stochastic multi-armed bandits , 2013, COLT.

[32]  Sham M. Kakade,et al.  Optimal Dynamic Mechanism Design and the Virtual Pivot Mechanism , 2013, Oper. Res..

[33]  Moshe Babaioff,et al.  Multi-parameter mechanisms with implicit payment computation , 2013, EC '13.

[34]  Moshe Babaioff,et al.  Characterizing truthful multi-armed bandit mechanisms: extended abstract , 2008, EC '09.

[35]  Yishay Mansour,et al.  Learning valuation distributions from partial observations , 2015, AAAI 2015.

[36]  Yishay Mansour,et al.  Bayesian Incentive-Compatible Bandit Exploration , 2015, EC.

[37]  Yishay Mansour,et al.  Learning Valuation Distributions from Partial Observation , 2014, AAAI.

[38]  Vianney Perchet,et al.  Online learning in repeated auctions , 2015, COLT.

[39]  Maria-Florina Balcan,et al.  Sample Complexity of Automated Mechanism Design , 2016, NIPS.

[40]  Rajkumar Buyya,et al.  An Auction Mechanism for Cloud Spot Markets , 2016, TAAS.

[41]  Tor Lattimore,et al.  On Explore-Then-Commit strategies , 2016, NIPS.

[42]  Haipeng Luo,et al.  Oracle-Efficient Online Learning and Auction Design , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[43]  Anna R. Karlin,et al.  Game Theory, Alive , 2017 .

[44]  Michael I. Jordan,et al.  Competing Bandits in Matching Markets , 2019, AISTATS.

[45]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .