论文信息 - Dynamic Incentive-Aware Learning: Robust Pricing in Contextual Auctions

Dynamic Incentive-Aware Learning: Robust Pricing in Contextual Auctions

Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers' valuations \new{for} an item depend on the context that describes the item. However, the seller is not aware of the relationship between the context and buyers' valuations, i.e., buyers' preferences. The seller's goal is to design a learning policy to set reserve prices via observing the past sales data, and her objective is to minimize her regret for revenue, where the regret is computed against a clairvoyant policy that knows buyers' heterogeneous preferences. Given the seller's goal, utility-maximizing buyers have the incentive to bid untruthfully in order to manipulate the seller's learning policy. We propose two learning policies that are robust to such strategic behavior. These policies use the outcomes of the auctions, rather than the submitted bids, to estimate the preferences while controlling the long-term effect of the outcome of each auction on the future reserve prices. The first policy called Contextual Robust Pricing (CORP) is designed for the setting where the market noise distribution is known to the seller and achieves a T-period regret of $O(d\log(Td) \log (T))$, where $d$ is the dimension of {the} contextual information. The second policy, which is a variant of the first policy, is called Stable CORP (SCORP). This policy is tailored to the setting where the market noise distribution is unknown to the seller and belongs to an ambiguity set. We show that the SCORP policy has a T-period regret of $O(\sqrt{d\log(Td)}\;T^{2/3})$.

[1] Assaf J. Zeevi,et al. Chasing Demand: Learning and Earning in a Changing Environment , 2016, Math. Oper. Res..

[2] Jon Feldman,et al. Yield optimization of display advertising with ad exchange , 2011, EC '11.

[3] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[4] Benjamin Van Roy,et al. Dynamic Pricing with a Prior on Market Response , 2010, Oper. Res..

[5] Hamid Nazerzadeh,et al. Dynamic Pricing for Heterogeneous Time-Sensitive Customers , 2020, Manuf. Serv. Oper. Manag..

[6] Roger B. Myerson,et al. Optimal Auction Design , 1981, Math. Oper. Res..

[7] Josef Broder,et al. Dynamic Pricing Under a General Parametric Choice Model , 2012, Oper. Res..

[8] Sergei Vassilvitskii,et al. A Field Guide to Personalized Reserve Prices , 2016, WWW.

[9] Renato Paes Leme,et al. Feature-based Dynamic Pricing , 2016, EC.

[10] Renato Paes Leme,et al. Contextual Pricing for Lipschitz Buyers , 2018, NeurIPS.

[11] Klaus M. Schmidt. Commitment Through Incomplete Information in a Simple Repeated Bargaining Game , 1993 .

[12] D. Simchi-Levi,et al. A Statistical Learning Approach to Personalization in Revenue Management , 2015, Manag. Sci..

[13] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[14] S. Matthew Weinberg,et al. Multi-armed Bandit Problems with Strategic Arms , 2017, COLT.

[15] Umar Syed,et al. Repeated Contextual Auctions with Strategic Buyers , 2014, NIPS.

[16] R. Gill,et al. Applications of the van Trees inequality : a Bayesian Cramr-Rao bound , 1995 .

[17] Christos Koufogiannakis,et al. A Nearly Linear-Time PTAS for Explicit Fractional Packing and Covering Linear Programs , 2013, Algorithmica.

[18] Mohsen Bayati,et al. Online Decision-Making with High-Dimensional Covariates , 2015 .

[19] S. Bikhchandani,et al. Behavior - Based Price Discrimination by a Patient Seller , 2011 .

[20] den Arnoud Boer. Dynamic Pricing and Learning , 2013 .

[21] Adel Javanmard. Perishability of Data: Dynamic Pricing under Varying-Coefficient Models , 2017, J. Mach. Learn. Res..

[22] Cun-Hui Zhang. Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[23] M. Rothschild. A two-armed bandit theory of market pricing , 1974 .

[24] Adel Javanmard,et al. Dynamic Pricing in High-Dimensions , 2016, J. Mach. Learn. Res..

[25] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[26] Victor F. Araman,et al. Dynamic Pricing for Nonperishable Products with Demand Learning , 2009, Oper. Res..

[27] Mehryar Mohri,et al. Learning Theory and Algorithms for revenue optimization in second price auctions with reserve , 2013, ICML.

[28] Renato Paes Leme,et al. Contextual Search via Intrinsic Volumes , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[29] S. Salant. When is Inducing Self-Selection Suboptimal for a Monopolist? , 1989 .

[30] Wang Chi Cheung,et al. Dynamic Pricing and Demand Learning with Limited Price Experimentation , 2017 .

[31] V. Mirrokni,et al. Boosted Second Price Auctions: Revenue Optimization for Heterogeneous Bidders , 2017 .

[32] Adam Schultz,et al. Dynamic Learning and Market Making in Spread Betting Markets with Informed Bettors , 2019, EC.

[33] Bert Zwart,et al. Simultaneously Learning and Optimizing Using Controlled Variance Pricing , 2014, Manag. Sci..

[34] J. Michael Harrison,et al. Bayesian Dynamic Pricing Policies: Learning and Earning Under a Binary Prior Distribution , 2011, Manag. Sci..

[35] M. Bagnoli,et al. Log-concave probability and its applications , 2004 .

[36] Ilan Lobel,et al. Intertemporal Price Discrimination: Structure and Computation of Optimal Policies , 2014, Manag. Sci..

[37] David P. Myatt,et al. Forthcoming in American Economic Review , 2022 .

[38] J. Tirole,et al. Contract Renegotiation and Coasian Dynamics , 1988 .

[39] Christian Borgs,et al. Optimal Multiperiod Pricing with Service Guarantees , 2013, Manag. Sci..

[40] Adel Javanmard,et al. Multi-Product Dynamic Pricing in High-Dimensions with Heterogeneous Price Sensitivity , 2019, 2020 IEEE International Symposium on Information Theory (ISIT).