论文信息 - RVσ(t): a unifying approach to performance and convergence in online multiagent learning

RVσ(t): a unifying approach to performance and convergence in online multiagent learning

We present a new multiagent learning algorithm (RV<inf>σ(t)</inf> that can guarantee both no-regret performance (all games) and policy convergence (some games of arbitrary size). Unlike its predecessor ReDVaLeR, it (1) does not need to distinguish whether its opponents are self-play or otherwise non-stationary, (2) is allowed to know its portion of any equilibrium that, we argue, leads to convergence in some games in addition to no-regret. Although the regret of RV<inf>σ(t)</inf> is analyzed in continuous time, we show that it grows slower than in other no-regret techniques like GIGA and GIGA-WoLF. We show that RV<inf>σ(t)</inf> can converge to coordinated behavior in coordination games, while GIGA, GIGA-WoLF may converge to poorly coordinated (mixed) behaviors.

Bikramjit Banerjee | Jing Peng

[1] Bikramjit Banerjee,et al. Performance Bounded Reinforcement Learning in Strategic Interactions , 2004, AAAI.

[2] Michael H. Bowling,et al. Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[3] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[4] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[5] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..