Learning Algorithms for Two-Person Zero-Sum Stochastic Games with Incomplete Information: A Unified Approach
暂无分享,去创建一个
This paper extends recent results [Lakshmivarahan and Narendra, Math. Oper. Res., 6 (1981), pp. 379–386] in two-person zero-sum sequential games in which the players use learning algorithms to update their strategies. It is assumed that neither player knows (i) the set of strategies available to the other player or (ii) the mixed strategy used by the other player or its pure realization at any stage. The outcome of the game depends on chance and the game is played sequentially. The distribution of the random outcome as a function of the pair of pure strategies chosen by the players is also, unknown to them. It is shown that if the players use a learning algorithm of, the reward-penalty type, with proper choice of certain parameters in the algorithm, the expected value of the mixed strategies for both players can be made arbitrarily close to optimal strategies.