A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
暂无分享,去创建一个
[1] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.
[2] Moshe Tennenholtz,et al. Dynamic Non-Bayesian Decision Making , 1997, J. Artif. Intell. Res..
[3] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[4] O. J. Vrieze. Linear programming and undiscounted stochastic games in which one player controls transitions , 1981 .
[5] T. Parthasarathy,et al. An orderfield property for stochastic games when one player controls transition probabilities , 1981 .
[6] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[7] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .
[8] Noga Alon,et al. The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.