Dynamic Pricing with Demand Covariates

We consider a firm that sells a product over T periods without knowing the demand function. The firm sequentially sets prices to earn revenue and to learn the underlying demand function simultaneously. In practice, this problem is commonly solved via greedy iterative least squares (GILS). At each time period, GILS estimates the demand as a linear function of the price by applying least squares to the set of prior prices and realized demands. Then a price that maximizes the revenue is used for the next period. The performance is measured by the regret, which is the expected revenue compared to an oracle that knows the true demand function. Recently, den Boer and Zwart (2014) and Keskin and Zeevi (2014) demonstrated that GILS is sub-optimal and introduced optimal algorithms which integrate forced price-dispersion with GILS. Here, we consider this dynamic pricing problem in a data-rich environment. We assume that the firm has access to demand covariates which may be predictive of the demand and prove that GILS achieves an asymptotically optimal regret of order log(T). We also show that the asymptotic optimality of GILS holds even when the covariates are uninformative. We validate our results via simulations on synthetic and real data.

[1]  H. Robbins,et al.  Adaptive Design and Stochastic Approximation , 1979 .

[2]  J. George Shanthikumar,et al.  A practical inventory control policy using operational statistics , 2005, Oper. Res. Lett..

[3]  A. V. den Boer,et al.  Dynamic Pricing and Learning: Historical Origins, Current Research, and New Directions , 2013 .

[4]  Omar Besbes,et al.  Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms , 2009, Oper. Res..

[5]  A. Zeevi,et al.  A Linear Response Bandit Problem , 2013 .

[6]  Inchi Hu,et al.  On consistency of Bayes estimates in a certainty equivalence adaptive system , 1998, IEEE Trans. Autom. Control..

[7]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[8]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[9]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[10]  X. Chao,et al.  Nonparametric Learning Algorithms for Joint Pricing and Inventory Control with Lost-Sales and Censored Demand , 2015 .

[11]  John Langford,et al.  Efficient Optimal Learning for Contextual Bandits , 2011, UAI.

[12]  P. Rousseeuw,et al.  Wiley Series in Probability and Mathematical Statistics , 2005 .

[13]  Arnoud V. den Boer,et al.  Dynamic Pricing with Multiple Products and Partially Specified Demand Distribution , 2014, Math. Oper. Res..

[14]  M. Puterman,et al.  Learning and pricing in an internet environment with binomial demands , 2005 .

[15]  Philippe Rigollet,et al.  Nonparametric Bandits with Covariates , 2010, COLT.

[16]  Assaf Zeevi,et al.  Performance Limitations in Bandit Problems with Side Observations , 2007 .

[17]  Bert Zwart,et al.  Dynamic Pricing and Learning with Finite Inventories , 2013, Oper. Res..

[18]  John Shawe-Taylor,et al.  PAC-Bayesian Analysis of Contextual Bandits , 2011, NIPS.

[19]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[20]  Omar Besbes,et al.  On the Minimax Complexity of Pricing in a Changing Environment , 2011, Oper. Res..

[21]  David Simchi-Levi,et al.  Online Network Revenue Management Using Thompson Sampling , 2017, Oper. Res..

[22]  Arnoud V. den Boer Tracking the market: Dynamic pricing and learning in a changing environment , 2015, Eur. J. Oper. Res..

[23]  Assaf J. Zeevi,et al.  Chasing Demand: Learning and Earning in a Changing Environment , 2016, Math. Oper. Res..

[24]  Vianney Perchet,et al.  The multi-armed bandit problem with covariates , 2011, ArXiv.

[25]  Ilya Segal,et al.  Optimal Pricing Mechanisms with Unknown Demand , 2002 .

[26]  Bert Zwart,et al.  Simultaneously Learning and Optimizing Using Controlled Variance Pricing , 2014, Manag. Sci..

[27]  Renato Paes Leme,et al.  Feature-based Dynamic Pricing , 2016, EC.

[28]  Assaf J. Zeevi,et al.  Dynamic Pricing with an Unknown Demand Model: Asymptotically Optimal Semi-Myopic Policies , 2014, Oper. Res..

[29]  J. Tropp User-Friendly Tail Bounds for Matrix Martingales , 2011 .

[30]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[31]  Umar Syed,et al.  Repeated Contextual Auctions with Strategic Buyers , 2014, NIPS.

[32]  M. Woodroofe A One-Armed Bandit Problem with a Concomitant Variable , 1979 .

[33]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[34]  Josef Broder,et al.  Dynamic Pricing Under a General Parametric Choice Model , 2012, Oper. Res..

[35]  Cynthia Rudin,et al.  The Big Data Newsvendor: Practical Insights from Machine Learning Analysis , 2013 .

[36]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[37]  Victor F. Araman,et al.  Dynamic Pricing for Nonperishable Products with Demand Learning , 2009, Oper. Res..

[38]  Mohsen Bayati,et al.  Online Decision-Making with High-Dimensional Covariates , 2015 .

[39]  Georgia Perakis,et al.  The Data-Driven Newsvendor Problem: New Bounds and Insights , 2015, Oper. Res..

[40]  J. Michael Harrison,et al.  Bayesian Dynamic Pricing Policies: Learning and Earning Under a Binary Prior Distribution , 2011, Manag. Sci..

[41]  H. Robbins,et al.  Iterated least squares in multiperiod control , 1982 .

[42]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[43]  Alejandro Francetich,et al.  Choosing a Good Toolkit: An Essay in Behavioral Economics , 2014 .

[44]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[45]  John Langford,et al.  Resourceful Contextual Bandits , 2014, COLT.

[46]  J. Sarkar One-Armed Bandit Problems with Covariates , 1991 .

[47]  Assaf J. Zeevi,et al.  A Note on Performance Limitations in Bandit Problems With Side Information , 2011, IEEE Transactions on Information Theory.