Dynamic Pricing Decision for Perishable Goods: A Q-Learning Approach

In this paper, we considered a dynamic pricing problem for selling a given stock of perishable items during a finite sale season. We developed a partially observed Markov decision process model to study this problem. In particularly, belief states were adopted to deal with the uncertainty of demand. A Q-learning approach was designed to solve the problem of obtaining optimal dynamic pricing policy, and this approach was validated by a simulation experiment.