MARKOV DECISION PROCESSES WITH UNCERTAIN TRANSITION RATES: SENSITIVITY AND MAX HYPHEN MIN CONTROL

Solution techniques for Markov decision problems rely on exact knowledge of the transition rates, which may be difficult or impossible to obtain. In this paper, we consider Markov decision problems with uncertain transition rates represented as compact sets. We first consider the problem of sensitivity analysis where the aim is to quantify the range of uncertainty of the average per-unit-time reward given the range of uncertainty of the transition rates. We then develop solution techniques for the problem of obtaining the max-min optimal policy, which maximizes the worst-case average per-unit-time reward. In each of these problems, we distinguish between systems that can have their transition rates chosen independently and those where the transition rates depend on each other. Our solution techniques are applicable to Markov decision processes with fixed but unknown transition rates and to those with time-varying transition rates.

[1]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[2]  Chelsea C. White,et al.  Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..

[3]  J. K. Satia,et al.  Markovian Decision Processes with Uncertain Transition Probabilities , 1973, Oper. Res..

[4]  Masami Yasuda,et al.  SEMI-MARKOV DECISION PROCESSES WITH COUNTABLE STATE SPACE AND COMPACT ACTION SPACE , 1978 .

[5]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[6]  Xi-Ren Cao,et al.  Gradient-based policy iteration: an example , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[7]  Mitsuo Sato,et al.  Learning control of finite Markov chains with unknown transition probabilities , 1982 .

[8]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[9]  Aurel A. Lazar,et al.  A Separation Principle Between Scheduling and Admission Control for Broadband Switching , 1993, IEEE J. Sel. Areas Commun..

[10]  Erhan Çinlar,et al.  Introduction to stochastic processes , 1974 .

[11]  P. Schweitzer On undiscounted markovian decision processes with compact action spaces , 1985 .

[12]  Robert Givan,et al.  Bounded Parameter Markov Decision Processes , 1997, ECP.

[13]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[14]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[15]  Mitsuo Sato,et al.  An asymptotically optimal learning controller for finite Markov chains with unknown transition probabilities , 1985 .

[16]  Don Towsley,et al.  On optimal call admission control in cellular networks , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[17]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Vol. II , 1976 .

[18]  Keith W. Ross,et al.  Optimal circuit access policies in an ISDN environment: a Markov decision approach , 1989, IEEE Trans. Commun..

[19]  J. A. Bather Markovian Decision Processes , 1971 .

[20]  Keith W. Ross,et al.  Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..