论文信息 - Online Reinforcement Learning in Stochastic Games

Online Reinforcement Learning in Stochastic Games

We study online reinforcement learning in average-reward stochastic games (SGs). An SG models a two-player zero-sum game in a Markov environment, where state transitions and one-step payoffs are determined simultaneously by a learner and an adversary. We propose the UCSG algorithm that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent. This result improves previous ones under the same setting. The regret bound has a dependency on the diameter, which is an intrinsic value related to the mixing property of SGs. If we let the opponent play an optimistic best response to the learner, UCSG finds an ε-maximin stationary policy with a sample complexity of Õ (poly(1/ε)), where ε is the gap to the best policy.

Chi-Jen Lu | Chen-Yu Wei | Yi-Te Hong

[1] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[2] Arie Hordijk,et al. Dynamic programming and Markov potential theory , 1974 .

[3] A. Federgruen. On N-person stochastic games by denumerable state space , 1978, Advances in Applied Probability.

[4] J. Wal,et al. Successive approximations for average reward Markov games , 1980 .

[5] J. Hunter. Generalized inverses and their application to applied probability problems , 1982 .

[6] J. Hunter,et al. Stationary Distributions and Mean First Passage Times of Perturbed Markov Chains , 1992 .

[7] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[8] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[9] C. D. Meyer,et al. Markov chain sensitivity measured by mean first passage times , 2000 .

[10] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[11] Michail G. Lagoudakis,et al. Value Function Approximation in Zero-Sum Markov Games , 2002, UAI.