论文信息 - Dynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms

Dynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms

We consider a dynamic pricing problem under unknown demand models. In this problem a seller offers prices to a stream of customers and observes either success or failure in each sale attempt. The underlying demand model is unknown to the seller and can take one of N possible forms. In this paper, we show that this problem can be formulated as a multi-armed bandit with dependent arms. We propose a dynamic pricing policy based on the likelihood ratio test. We show that the proposed policy achieves complete learning, i.e., it offers a bounded regret where regret is defined as the revenue loss with respect to the case with a known demand model. This is in sharp contrast with the logarithmic growing regret in multi-armed bandit with independent arms.

Qing Zhao | Pouya Tehrani | Yixuan Zhai

[1] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[2] N. Kiefer,et al. Controlling a Stochastic Process with Unknown Parameters , 1988 .

[3] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[4] Don H. Johnson,et al. Symmetrizing the Kullback-Leibler Distance , 2001 .

[5] J. Michael Harrison,et al. Bayesian Dynamic Pricing Policies: Learning and Earning Under a Binary Prior Distribution , 2011, Manag. Sci..

[6] Omar Besbes,et al. Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms , 2009, Oper. Res..

[7] Frank Thomson Leighton,et al. The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[8] Eric W. Cope,et al. Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[9] Peter Auer,et al. Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[10] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[11] R. Agrawal. The Continuum-Armed Bandit Problem , 1995 .