Bayesian Dynamic Pricing in Queueing Systems with Unknown Delay Cost Characteristics

The revenue management literature for queues typically assumes that providers know the distribution of customer demand attributes. We study an observable M/M/1 queue that serves an unknown proportion of patient and impatient customers. The provider has a Bernoulli prior on this proportion, corresponding to an optimistic or pessimistic scenario. For every queue length, she chooses a low or a high price, or turns customers away. Only the high price is informative. The optimal Bayesian price for a queue state is belief-dependent if the optimal policies for the underlying scenarios disagree at that queue state; in this case the policy has a belief-threshold structure. The optimal Bayesian pricing policy as a function of queue length has a zone or, nested-threshold structure. Moreover, the price convergence under the optimal Bayesian policy is sensitive to the system size, i.e., the maximum queue length. We identify two cases: prices converge 1 almost surely to the optimal prices in either scenario or 2 with positive probability to suboptimal prices. Only Case 2 is consistent with the typical incomplete learning outcome observed in the literature.

[1]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[2]  Refael Hassin,et al.  To Queue or Not to Queue: Equilibrium Behavior in Queueing Systems , 2002 .

[3]  Baris Ata,et al.  Dynamic Control of a Multiclass Queue with Thin Arrival Streams , 2006, Oper. Res..

[4]  Katy S. Azoury Bayes Solution to Dynamic Inventory Models Under Unknown Demand Distribution , 1985 .

[5]  P. Naor The Regulation of Queue Size by Levying Tolls , 1969 .

[6]  R. L. Winkler,et al.  Learning, Experimentation, and the Optimal Output Decisions of a Competitive Firm , 1982 .

[7]  Victor F. Araman,et al.  Dynamic Pricing for Nonperishable Products with Demand Learning , 2009, Oper. Res..

[8]  W. Lovejoy Myopic policies for some inventory models with uncertain demand distributions , 1990 .

[9]  Samuel Karlin,et al.  A First Course on Stochastic Processes , 1968 .

[10]  J.N. Tsitsiklis,et al.  A structured multiarmed bandit problem and the greedy policy , 2008, 2008 47th IEEE Conference on Decision and Control.

[11]  George Leitmann,et al.  Dynamics and Control , 2020, Fundamentals of Robotics.

[12]  N. Kiefer,et al.  Controlling a Stochastic Process with Unknown Parameters , 1988 .

[13]  Yossi Aviv,et al.  A Partially Observed Markov Decision Process for Dynamic Pricing , 2005, Manag. Sci..

[14]  Josef Broder,et al.  Dynamic Pricing Under a General Parametric Choice Model , 2012, Oper. Res..

[15]  J. Kingman A FIRST COURSE IN STOCHASTIC PROCESSES , 1967 .

[16]  Upendra Dave,et al.  Applied Probability and Queues , 1987 .

[17]  J. Harrison,et al.  Dynamic Pricing with an Unknown Linear Demand Model : Asymptotically Optimal Semi-myopic Policies , 2011 .

[18]  N. Petruzzi,et al.  Dynamic pricing and inventory control with learning , 2002 .

[19]  Omar Besbes,et al.  Revenue Optimization for a Make-to-Order Queue in an Uncertain Market Environment , 2009, Oper. Res..

[20]  George H. Weiss,et al.  A First Course in Stochastic Processes, 2nd sd. (Samuel Karlin and Howard M. Taylor) , 1977 .

[21]  Omar Besbes,et al.  Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms , 2009, Oper. Res..

[22]  B. Jullien,et al.  OPTIMAL LEARNING BY EXPERIMENTATION , 1991 .

[23]  T. Lai,et al.  Optimal learning and experimentation in bandit problems , 2000 .

[24]  J. Banks,et al.  Denumerable-Armed Bandits , 1992 .

[25]  S. Boyd,et al.  Pricing and learning with uncertain demand , 2003 .

[26]  Assaf J. Zeevi,et al.  Dynamic Pricing with an Unknown Demand Model: Asymptotically Optimal Semi-Myopic Policies , 2014, Oper. Res..

[27]  Sven Rady,et al.  Optimal Experimentation in a Changing Environment , 1997 .

[28]  H. Scarf Bayes Solutions of the Statistical Inventory Problem , 1959 .

[29]  Li Chen,et al.  Dynamic Inventory Management with Learning About the Demand Distribution and Substitution Probability , 2008, Manuf. Serv. Oper. Manag..

[30]  M. Rothschild A two-armed bandit theory of market pricing , 1974 .

[31]  Murray Z. Frank,et al.  State Dependent Pricing with a Queue , 2001 .

[32]  Benjamin Van Roy,et al.  Dynamic Pricing with a Prior on Market Response , 2010, Oper. Res..

[33]  R. Randhawa,et al.  Pricing in Queues without Demand Information , 2012 .

[34]  Bert Zwart,et al.  Simultaneously Learning and Optimizing Using Controlled Variance Pricing , 2014, Manag. Sci..

[35]  Y. Masuda,et al.  Dynamic Pricing for Network Service: Equilibrium and Stability , 1999 .

[36]  Linn I. Sennott,et al.  Optimal Stationary Policies in General State Space Markov Decision Chains with Finite Action Sets , 1992, Math. Oper. Res..

[37]  J. Tsitsiklis A short proof of the Gittins index theorem , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[38]  J. Michael Harrison,et al.  Bayesian Dynamic Pricing Policies: Learning and Earning Under a Binary Prior Distribution , 2011, Manag. Sci..