Online Learning for Measuring Incentive Compatibility in Ad Auctions?

In this paper we investigate the problem of measuring end-to-end Incentive Compatibility (IC) regret given black-box access to an auction mechanism. Our goal is to 1) compute an estimate for IC regret in an auction, 2) provide a measure of certainty around the estimate of IC regret, and 3) minimize the time it takes to arrive at an accurate estimate. We consider two main problems, with different informational assumptions: In the advertiser problem the goal is to measure IC regret for some known valuation v, while in the more general demand-side platform (DSP) problem we wish to determine the worst-case IC regret over all possible valuations. The problems are naturally phrased in an online learning model and we design algorithms for both problems. We give an online learning algorithm where for the advertiser problem the error of determining IC shrinks as (where B is the finite set of bids, T is the number of time steps, and n is number of auctions per time step), and for the DSP problem it shrinks as . For the DSP problem, we also consider stronger IC regret estimation and extend our algorithm to achieve better IC regret error. We validate the theoretical results using simulations with Generalized Second Price (GSP) auctions, which are known to not be incentive compatible and thus have strictly positive IC regret.

[1]  E. H. Clarke Multipart pricing of public goods , 1971 .

[2]  Shuchi Chawla,et al.  Mechanism design for data science , 2014, EC.

[3]  Theodore Groves,et al.  Incentives in Teams , 1973 .

[4]  Vasilis Syrgkanis,et al.  Learning to Bid Without Knowing your Value , 2017, EC.

[5]  Wtt Wtt Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .

[6]  Wei Chen,et al.  Combinatorial Multi-Armed Bandit with General Reward Functions , 2016, NIPS.

[7]  Noam Nisan,et al.  Computationally feasible VCG mechanisms , 2000, EC '00.

[8]  Roger B. Myerson,et al.  Optimal Auction Design , 1981, Math. Oper. Res..

[9]  Sergei Vassilvitskii,et al.  Testing Incentive Compatibility in Display Ad Auctions , 2018, WWW.

[10]  Vijay Kumar,et al.  Online learning in online auctions , 2003, SODA '03.

[11]  Aaron Roth,et al.  Online Learning and Profit Maximization from Revealed Preferences , 2014, AAAI.

[12]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[13]  William Vickrey,et al.  Counterspeculation, Auctions, And Competitive Sealed Tenders , 1961 .

[14]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[15]  Umar Syed,et al.  Repeated Contextual Auctions with Strategic Buyers , 2014, NIPS.

[16]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[17]  Yonatan Gur,et al.  Learning in Repeated Auctions with Budgets: Regret Minimization and Equilibrium , 2017, EC.

[18]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[19]  John Langford,et al.  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[20]  Wei Chen,et al.  Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.

[21]  J. Rochet A necessary and sufficient condition for rationalizability in a quasi-linear context , 1987 .

[22]  Vianney Perchet,et al.  Online learning in repeated auctions , 2015, COLT.

[23]  Wei Chen,et al.  Combinatorial multi-armed bandit: general framework, results and applications , 2013, ICML 2013.