论文信息 - Weighted reward criteria in Competitive Markov Decision Processes

Weighted reward criteria in Competitive Markov Decision Processes

We consider Competitive Markov Decision Processes in which the controllers/players are antagonistic and aggregate their sequences of expected rewards according to “weighted” or “horizonsensitive” criteria. These are either a convex combination of two discounted objectives, or of one discounted and one limiting average reward objective. In both cases we establish the existence of the game-theoretic value vector, and supply a description of 6-optimal non-stationary strategies.

Jerzy A. Filar | O. J. Vrieze | J. Filar

[1] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[2] Dean Gillette,et al. 9. STOCHASTIC GAMES WITH ZERO STOP PROBABILITIES , 1958 .

[3] D. Blackwell,et al. THE BIG MATCH , 1968, Classics in Game Theory.

[4] A. Federgruen. On N-person stochastic games by denumerable state space , 1978, Advances in Applied Probability.

[5] Jerzy A. Filar,et al. A Weighted Markov Decision Process , 1992, Oper. Res..