Nonparametric Pricing Analytics with Customer Covariates

Personalized pricing analytics is becoming an essential tool in retailing. Upon observing the personalized information of each arriving customer, the firm needs to set a price accordingly based on the covariates such as income, education background, past purchasing history to extract more revenue. For new entrants of the business, the lack of historical data may severely limit the power and profitability of personalized pricing. We propose a nonparametric pricing policy to simultaneously learn the preference of customers based on the covariates and maximize the expected revenue over a finite horizon. The policy does not depend on any prior assumptions on how the personalized information affects consumers' preferences (such as linear models). It is adaptively splits the covariate space into smaller bins (hyper-rectangles) and clusters customers based on their covariates and preferences, offering similar prices for customers who belong to the same cluster trading off granularity and accuracy. We show that the algorithm achieves a regret of order $O(\log(T)^2 T^{(2+d)/(4+d)})$, where $T$ is the length of the horizon and $d$ is the dimension of the covariate. It improves the current regret in the literature \citep{slivkins2014contextual}, under mild technical conditions in the pricing context (smoothness and local concavity). We also prove that no policy can achieve a regret less than $O(T^{(2+d)/(4+d)})$ for a particular instance and thus demonstrate the near optimality of the proposed policy.

[1]  Benjamin Van Roy,et al.  Dynamic Pricing with a Prior on Market Response , 2010, Oper. Res..

[2]  A. Zeevi,et al.  Woodroofe's One-Armed Bandit Problem Revisited , 2009, 0909.0119.

[3]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[4]  Mohsen Bayati,et al.  Online Decision-Making with High-Dimensional Covariates , 2015 .

[5]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[6]  Wang Chi Cheung,et al.  Dynamic Pricing and Demand Learning with Limited Price Experimentation , 2017 .

[7]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[8]  Bert Zwart,et al.  Simultaneously Learning and Optimizing Using Controlled Variance Pricing , 2014, Manag. Sci..

[9]  David Simchi-Levi,et al.  Dynamic Learning and Price Optimization with Endogeneity Effect , 2016 .

[10]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[11]  Near-Optimal Bisection Search for Nonparametric Dynamic Pricing with Inventory Constraint , 2014 .

[12]  Adel Javanmard,et al.  Dynamic Pricing in High-Dimensions , 2016, J. Mach. Learn. Res..

[13]  N. B. Keskin,et al.  Personalized Dynamic Pricing with Machine Learning: High Dimensional Features and Heterogeneous Elasticity , 2020 .

[14]  Josef Broder,et al.  Dynamic Pricing Under a General Parametric Choice Model , 2012, Oper. Res..

[15]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[16]  Vianney Perchet,et al.  The multi-armed bandit problem with covariates , 2011, ArXiv.

[17]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[18]  Renato Paes Leme,et al.  Feature-based Dynamic Pricing , 2016, EC.

[19]  Philippe Rigollet,et al.  Nonparametric Bandits with Covariates , 2010, COLT.

[20]  Mohsen Bayati,et al.  Dynamic Pricing with Demand Covariates , 2016, 1604.07463.

[21]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[22]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[23]  Yuhong Yang,et al.  RANDOMIZED ALLOCATION WITH NONPARAMETRIC ESTIMATION FOR A MULTI-ARMED BANDIT PROBLEM WITH COVARIATES , 2002 .

[24]  Zizhuo Wang,et al.  Close the Gaps: A Learning-While-Doing Algorithm for Single-Product Revenue Management Problems , 2014, Oper. Res..

[25]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[26]  Omar Besbes,et al.  Blind Network Revenue Management , 2011, Oper. Res..

[27]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[28]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[29]  Victor F. Araman,et al.  Dynamic Pricing for Nonperishable Products with Demand Learning , 2009, Oper. Res..

[30]  G. Gallego,et al.  Revenue Management and Pricing Analytics , 2019, International Series in Operations Research & Management Science.

[31]  Assaf J. Zeevi,et al.  Dynamic Pricing with an Unknown Demand Model: Asymptotically Optimal Semi-Myopic Policies , 2014, Oper. Res..

[32]  Guillermo Gallego,et al.  A Primal-dual Learning Algorithm for Personalized Dynamic Pricing with an Inventory Constraint , 2018, Math. Oper. Res..

[33]  Doina Precup,et al.  Algorithms for multi-armed bandit problems , 2014, ArXiv.

[34]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[35]  A. V. den Boer,et al.  Dynamic Pricing and Learning: Historical Origins, Current Research, and New Directions , 2013 .

[36]  Omar Besbes,et al.  Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms , 2009, Oper. Res..

[37]  A. Zeevi,et al.  A Linear Response Bandit Problem , 2013 .

[38]  Holger Rauhut,et al.  A Mathematical Introduction to Compressive Sensing , 2013, Applied and Numerical Harmonic Analysis.