Online learning in repeated auctions

Motivated by online advertising auctions, we consider repeated Vickrey auctions where goods of unknown value are sold sequentially and bidders only learn (potentially noisy) information about a good's value once it is purchased. We adopt an online learning approach with bandit feedback to model this problem and derive bidding strategies for two models: stochastic and adversarial. In the stochastic model, the observed values of the goods are random variables centered around the true value of the good. In this case, logarithmic regret is achievable when competing against well behaved adversaries. In the adversarial model, the goods need not be identical and we simply compare our performance against that of the best fixed bid in hindsight. We show that sublinear regret is also achievable in this case and prove matching minimax lower bounds. To our knowledge, this is the first complete set of strategies for bidders participating in auctions of this type.

[1]  Peter Secretan Learning , 1965, Mental Health.

[2]  Roger B. Myerson,et al.  Optimal Auction Design , 1981, Math. Oper. Res..

[3]  Robert B. Wilson Game-Theoretic Analysis of Trading Processes. , 1985 .

[4]  Richard P. McLean,et al.  FULL EXTRACTION OF THE SURPLUS IN BAYESIAN AND DOMINANT STRATEGY AUCTIONS , 1988 .

[5]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[6]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[7]  Vijay Kumar,et al.  Online learning in online auctions , 2003, SODA '03.

[8]  H. Poor,et al.  Bandit problems with arbitrary side observations , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[9]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[10]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[11]  Brendan Kitts,et al.  Optimal Bidding on Keyword Auctions , 2004, Electron. Mark..

[12]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[13]  Robert B. Wilson Competitive Bidding with Disparate Information , 2007 .

[14]  Ambuj Tewari,et al.  Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[15]  Rune B. Lyngsø,et al.  Lecture Notes I , 2008 .

[16]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[17]  Tim Roughgarden,et al.  Simple versus optimal mechanisms , 2009, SECO.

[18]  Luc Deneire,et al.  Statistique Appliquée , 2009 .

[19]  Moshe Babaioff,et al.  Characterizing truthful multi-armed bandit mechanisms: extended abstract , 2009, EC '09.

[20]  Sherwood C. Frey,et al.  Auctions , 2009, SSRN Electronic Journal.

[21]  S. Muthukrishnan,et al.  Ad Exchanges: Research Issues , 2009, WINE.

[22]  Nikhil R. Devanur,et al.  The price of truthfulness for pay-per-click auctions , 2009, EC '09.

[23]  Moshe Babaioff,et al.  Truthful mechanisms with implicit payment computation , 2010, EC '10.

[24]  Tim Roughgarden,et al.  Revenue maximization with a single sample , 2010, EC '10.

[25]  Elad Hazan,et al.  Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.

[26]  R. McAfee,et al.  The Design of Advertising Exchanges , 2011 .

[27]  Vianney Perchet,et al.  The multi-armed bandit problem with covariates , 2011, ArXiv.

[28]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[29]  Michael Ostrovsky,et al.  Reserve Prices in Internet Advertising Auctions: A Field Experiment , 2009, Journal of Political Economy.

[30]  Anton Schwaighofer,et al.  Budget Optimization for Sponsored Search: Censored Learning in MDPs , 2012, UAI.

[31]  A. Proutière,et al.  Repeated Auctions under Budget Constraints : Optimal bidding strategies and Equilibria , 2012 .

[32]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[33]  Tim Roughgarden,et al.  Supply-limiting mechanisms , 2012, EC '12.

[34]  Karthik Sridharan,et al.  Competing With Strategies , 2013, COLT.

[35]  Bandits with Knapsacks , 2013, FOCS.

[36]  Vianney Perchet,et al.  Bounded regret in stochastic multi-armed bandits , 2013, COLT.

[37]  Hu Fu,et al.  Prior-independent auctions for risk-averse agents , 2013, EC '13.

[38]  Karthik Sridharan,et al.  Online Learning with Predictable Sequences , 2012, COLT.

[39]  Richard Cole,et al.  The sample complexity of revenue maximization , 2014, STOC.

[40]  Nima Haghpanah,et al.  Optimal auctions for correlated buyers with sampling , 2014, EC.

[41]  Csaba Szepesvári,et al.  Partial Monitoring - Classification, Regret Bounds, and Algorithms , 2014, Math. Oper. Res..

[42]  Nicholas R. Jennings,et al.  Efficient Regret Bounds for Online Bid Optimisation in Budget-Limited Sponsored Search Auctions , 2014, UAI.

[43]  Umar Syed,et al.  Repeated Contextual Auctions with Strategic Buyers , 2014, NIPS.

[44]  Mehryar Mohri,et al.  Learning Theory and Algorithms for revenue optimization in second price auctions with reserve , 2013, ICML.

[45]  Shuchi Chawla,et al.  Mechanism design for data science , 2014, EC.

[46]  Yishay Mansour,et al.  Learning valuation distributions from partial observations , 2015, AAAI 2015.

[47]  Tim Roughgarden,et al.  Revenue maximization with a single sample , 2015, Games Econ. Behav..

[48]  Aaron Roth,et al.  Online Learning and Profit Maximization from Revealed Preferences , 2014, AAAI.

[49]  Karthik Sridharan,et al.  Adaptive Online Learning , 2015, NIPS.

[50]  Gabriel Y. Weintraub,et al.  Repeated Auctions with Budgets in Ad Exchanges: Approximations and Design , 2014, Manag. Sci..

[51]  Claudio Gentile,et al.  Ieee Transactions on Information Theory 1 Regret Minimization for Reserve Prices in Second-price Auctions , 2022 .

[52]  Yishay Mansour,et al.  Learning Valuation Distributions from Partial Observation , 2014, AAAI.

[53]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .