Learning to Bid Without Knowing your Value

We address online learning in complex auction settings, such as sponsored search auctions, where the value of the bidder is unknown to her, evolving in an arbitrary manner and observed only if the bidder wins an allocation. We leverage the structure of the utility of the bidder and the partial feedback that bidders typically receive in auctions, in order to provide algorithms with regret rates against the best fixed bid in hindsight, that are exponentially faster in convergence in terms of dependence on the action space, than what would have been derived by applying a generic bandit algorithm and almost equivalent to what would have been achieved in the full information setting. Our results are enabled by analyzing a new online learning setting with outcome-based feedback, which generalizes learning with feedback graphs. We provide an online learning algorithm for this setting, of independent interest, with regret that grows only logarithmically with the number of actions and linearly only in the number of potential outcomes (the latter being very small in most auction settings). Last but not least, we show that our algorithm outperforms the bandit approach experimentally and that this performance is robust to dropping some of our theoretical assumptions or introducing noise in the feedback that the bidder receives.

[1]  Richard Cole,et al.  The sample complexity of revenue maximization , 2014, STOC.

[2]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[3]  Shie Mannor,et al.  From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.

[4]  Umar Syed,et al.  Repeated Contextual Auctions with Strategic Buyers , 2014, NIPS.

[5]  Sergei Vassilvitskii,et al.  Revenue Optimization with Approximate Bid Predictions , 2017, NIPS.

[6]  Vijay Kumar,et al.  Online learning in online auctions , 2003, SODA '03.

[7]  Renato Paes Leme,et al.  Bounding the inefficiency of outcomes in generalized second price auctions , 2012, J. Econ. Theory.

[8]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[9]  Yishay Mansour,et al.  Learning valuation distributions from partial observations , 2015, AAAI 2015.

[10]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[11]  Noga Alon,et al.  Online Learning with Feedback Graphs: Beyond Bandits , 2015, COLT.

[12]  Michael Ostrovsky,et al.  Reserve Prices in Internet Advertising Auctions: A Field Experiment , 2009, Journal of Political Economy.

[13]  Éva Tardos,et al.  Learning and Efficiency in Games with Dynamic Population , 2015, SODA.

[14]  Gabriel Y. Weintraub,et al.  Repeated Auctions with Budgets in Ad Exchanges: Approximations and Design , 2014, Manag. Sci..

[15]  Éva Tardos,et al.  Can Credit Increase Revenue? , 2013, WINE.

[16]  Aaron Roth,et al.  Online Learning and Profit Maximization from Revealed Preferences , 2014, AAAI.

[17]  John Langford,et al.  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[18]  Mohammad Taghi Hajiaghayi,et al.  Regret minimization and the price of total anarchy , 2008, STOC.

[19]  Tamir Hazan,et al.  Online Learning with Feedback Graphs Without the Graphs , 2016, ICML 2016.

[20]  Mukund Sundararajan,et al.  Mean Field Equilibria of Dynamic Auctions with Learning , 2014, Manag. Sci..

[21]  Tim Roughgarden,et al.  Revenue maximization with a single sample , 2015, Games Econ. Behav..

[22]  Vianney Perchet,et al.  Online learning in repeated auctions , 2015, COLT.

[23]  Éva Tardos,et al.  Composable and efficient mechanisms , 2012, STOC '13.

[24]  Yishay Mansour,et al.  Learning Valuation Distributions from Partial Observation , 2014, AAAI.

[25]  Noga Alon,et al.  From Bandits to Experts: A Tale of Domination and Independence , 2013, NIPS.

[26]  Roi Livni,et al.  Bandits with Movement Costs and Adaptive Pricing , 2017, COLT.

[27]  Yashodhan Kanoria,et al.  Incentive-Compatible Learning of Reserve Prices for Repeated Auctions , 2014, WINE.

[28]  Claudio Gentile,et al.  Ieee Transactions on Information Theory 1 Regret Minimization for Reserve Prices in Second-price Auctions , 2022 .

[29]  Shuchi Chawla,et al.  Mechanism design for data science , 2014, EC.

[30]  Mehryar Mohri,et al.  Learning Theory and Algorithms for revenue optimization in second price auctions with reserve , 2013, ICML.

[31]  Roi Livni,et al.  Online Pricing with Strategic and Patient Buyers , 2016, NIPS.

[32]  Tamás Linder,et al.  Efficient Tracking of Large Classes of Experts , 2011, IEEE Transactions on Information Theory.

[33]  Yonatan Gur,et al.  Learning in Repeated Auctions with Budgets: Regret Minimization and Equilibrium , 2017, EC.

[34]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[35]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[36]  Ramesh Johari,et al.  Mean Field Equilibrium in Dynamic Games with Strategic Complementarities , 2013, Oper. Res..

[37]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..