Convergence, Targeted Optimality, and Safety in Multiagent Learning
暂无分享,去创建一个
[1] Yoav Shoham,et al. Learning against opponents with bounded memory , 2005, IJCAI.
[2] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[3] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .
[4] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.
[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[6] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[7] Peter Stone,et al. Online Multiagent Learning against Memory Bounded Adversaries , 2008, ECML/PKDD.
[8] Yoav Shoham,et al. A general criterion and an algorithmic framework for learning in multi-agent systems , 2007, Machine Learning.
[9] Manuela M. Veloso,et al. Convergence of Gradient Dynamics with a Variable Learning Rate , 2001, ICML.
[10] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[11] Bikramjit Banerjee,et al. Performance Bounded Reinforcement Learning in Strategic Interactions , 2004, AAAI.
[12] L. Buşoniu,et al. A comprehensive survey of multi-agent reinforcement learning , 2011 .