论文信息 - Convergence, Targeted Optimality, and Safety in Multiagent Learning

Convergence, Targeted Optimality, and Safety in Multiagent Learning

In the previous chapter, we presented an algorithm LoE-AIM that models memory-bounded agents assuming that the memory size of these agents is known beforehand. In situations where such prior knowledge is unavailable, a possible solution can be to use a very large memory size that suffices to be a conservative upper-bound of the true unknown memory size.

Peter Stone | Doran Chakraborty

[1] Yoav Shoham,et al. Learning against opponents with bounded memory , 2005, IJCAI.

[2] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[3] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .

[4] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[7] Peter Stone,et al. Online Multiagent Learning against Memory Bounded Adversaries , 2008, ECML/PKDD.

[8] Yoav Shoham,et al. A general criterion and an algorithmic framework for learning in multi-agent systems , 2007, Machine Learning.

[9] Manuela M. Veloso,et al. Convergence of Gradient Dynamics with a Variable Learning Rate , 2001, ICML.

[10] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[11] Bikramjit Banerjee,et al. Performance Bounded Reinforcement Learning in Strategic Interactions , 2004, AAAI.

[12] L. Buşoniu,et al. A comprehensive survey of multi-agent reinforcement learning , 2011 .