Value iteration and optimization of multiclass queueing networks
暂无分享,去创建一个
This paper considers in parallel the scheduling problem for multiclass queueing networks, and optimization of Markov decision processes. It is shown that the value iteration algorithm may perform poorly when the algorithm is not initialized properly. The algorithm is initialized with a stochastic Lyapunov function, then convergence is guaranteed, and each policy is stabilized. For the network scheduling problem it is argued that a natural choice for the initial value function is the value function for the associated deterministic control problem based upon a fluid model, or the approximate solution to Poisson's equation obtained from the LP of Kumar and Meyn (1996). Numerical studies show that either choice may lead to fast convergence to an optimal policy.
[1] V. Borkar. Topics in controlled Markov chains , 1991 .
[2] M. K. Ghosh,et al. Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .
[3] Sean P. Meyn. The Policy Improvement Algorithm for Markov Decision Processes , 1997 .