Weighted reward criteria in Competitive Markov Decision Processes

We consider Competitive Markov Decision Processes in which the controllers/players are antagonistic and aggregate their sequences of expected rewards according to “weighted” or “horizonsensitive” criteria. These are either a convex combination of two discounted objectives, or of one discounted and one limiting average reward objective. In both cases we establish the existence of the game-theoretic value vector, and supply a description of 6-optimal non-stationary strategies.

[1]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[2]  Dean Gillette,et al.  9. STOCHASTIC GAMES WITH ZERO STOP PROBABILITIES , 1958 .

[3]  D. Blackwell,et al.  THE BIG MATCH , 1968, Classics in Game Theory.

[4]  A. Federgruen On N-person stochastic games by denumerable state space , 1978, Advances in Applied Probability.

[5]  Jerzy A. Filar,et al.  A Weighted Markov Decision Process , 1992, Oper. Res..