Weighted Reward Criteria in Competitive Markov Decision Processes

Abstract We consider Competitive Markov Decision Processes in which the controllers/players are antagonistic and aggregate their sequences of expected rewards according to “weighted” or “horizon-sensitive” criteria. These are either a convex combination of two discounted objectives, or of one discounted and one limiting average reward objective. In both cases we establish the existence of the game-theoretic value vector, and supply a description of c-optimal non-stationary strategies.

[1]  W. Whitt Representation and Approximation of Noncooperative Sequential Games , 1980 .

[2]  A. Federgruen On N-person stochastic games by denumerable state space , 1978, Advances in Applied Probability.

[3]  D. Blackwell,et al.  THE BIG MATCH , 1968, Classics in Game Theory.

[4]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.