On Multichain Markov Games

Two-person, zero-sum Markov games with arbitrary (Borel) state and action spaces, unbounded stage costs, and the average cost criterion are considered. The assumption on the transition probabilities implies some n-stage contraction property and lets the Markov chain of the states under a given strategy pair have several periodic recurrence classes, but the recurrence structure of all possible resulting Markov chains is identical. Under this assumption, some results are presented concerning the existence of 8-optimal strategies.

[1]  D. Bertsekas,et al.  Alternative theoretical frameworks for finite horizon discrete-time stochastic optimal control , 1977, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.

[2]  William D. Sudderth,et al.  Finitely additive stochastic games with Borel measurable payoffs , 1998, Int. J. Game Theory.

[3]  D. Blackwell,et al.  The Optimal Reward Operator in Dynamic Programming , 1974 .

[4]  K. Hinderer,et al.  Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[5]  Andrzej S. Nowak Zero-Sum Average Payoff Stochastic Games with General State Space , 1994 .

[6]  A. A. Yushkevich,et al.  Blackwell optimal policies in a Markov decision process with a Borel state space , 1994, Math. Methods Oper. Res..

[7]  Heinz-Uwe Küenle,et al.  Equilibrium Strategies in Stochastic Games with additive Cost and Transition Structure , 1999, IGTR.

[8]  A. Maitra,et al.  Borel Stochastic Games with Lim Sup Payoff , 1993 .

[9]  Andrzej S. Nowak,et al.  Optimal strategies in a class of zero-sum ergodic stochastic games , 1999, Math. Methods Oper. Res..

[10]  William D. Sudderth,et al.  Finitely additive and measurable stochastic games , 1993 .

[11]  M. Kurano The existence of minimum pair of state and policy for Markov decision processes under the hypothesis of Doeblin , 1989 .

[12]  A. Hordijk,et al.  Contraction Conditions for Average and α-Discount Optimality in Countable State Markov Games with Unbounded Rewards , 1997, Math. Oper. Res..

[13]  Masami Kurano,et al.  Average cost Markov decision processes under the hypothesis of Doeblin , 1991, Ann. Oper. Res..