Solving Controlled Markov Set-Chains With Discounting via Multipolicy Improvement

We consider Markov decision processes (MDPs) where the state transition probability distributions are not uniquely known, but are known to belong to some intervals-so called "controlled Markov set-chains"-with infinite-horizon discounted reward criteria. We present formal methods to improve multiple policies for solving such controlled Markov set-chains. Our multipolicy improvement methods follow the spirit of parallel rollout and policy switching for solving MDPs. In particular, these methods are useful for online control of Markov set-chains and for designing policy iteration (PI) type algorithms. We develop a PI-type algorithm and prove that it converges to an optimal policy

[1]  Chelsea C. White,et al.  Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..

[2]  Darald J. Hartfiel,et al.  Markov Set-Chains , 1998 .

[3]  Robert Givan,et al.  Scheduling Multiclass Packet Streams to Minimize Weighted Loss , 2002, Queueing Syst. Theory Appl..

[4]  Robert Givan,et al.  Parallel Rollout for Online Solution of Partially Observable Markov Decision Processes , 2004, Discret. Event Dyn. Syst..

[5]  Ness B. Shroff,et al.  MARKOV DECISION PROCESSES WITH UNCERTAIN TRANSITION RATES: SENSITIVITY AND MAX HYPHEN MIN CONTROL , 2004 .

[6]  Michael C. Fu,et al.  Evolutionary policy iteration for solving Markov decision processes , 2005, IEEE Transactions on Automatic Control.

[7]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[8]  Michael C. Fu,et al.  An Evolutionary Random Policy Search Algorithm for Solving Markov Decision Processes , 2007, INFORMS J. Comput..

[9]  Hyeong Soo Chang Error bounds for finite step approximations for solving infinite horizon controlled Markov set-chains , 2005, IEEE Transactions on Automatic Control.

[10]  J. K. Satia,et al.  Markovian Decision Processes with Uncertain Transition Probabilities , 1973, Oper. Res..

[11]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[12]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[13]  Masanori Hosaka,et al.  CONTROLLED MARKOV SET-CHAINS WITH DISCOUNTING , 1998 .

[14]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .