We present a new multiagent learning algorithm (<i>RV</i><inf>σ(<i>t</i>)</inf> that can guarantee both no-regret performance (all games) and policy convergence (some games of arbitrary size). Unlike its predecessor ReDVaLeR, it (1) does not need to distinguish whether its opponents are self-play or otherwise non-stationary, (2) is allowed to know its portion of <i>any</i> equilibrium that, we argue, leads to convergence in some games in addition to no-regret. Although the regret of <i>RV</i><inf>σ(<i>t</i>)</inf> is analyzed in continuous time, we show that it grows slower than in other no-regret techniques like GIGA and GIGA-WoLF. We show that <i>RV</i><inf>σ(<i>t</i>)</inf> can converge to coordinated behavior in coordination games, while GIGA, GIGA-WoLF may converge to poorly coordinated (mixed) behaviors.
[1]
Bikramjit Banerjee,et al.
Performance Bounded Reinforcement Learning in Strategic Interactions
,
2004,
AAAI.
[2]
Michael H. Bowling,et al.
Convergence and No-Regret in Multiagent Learning
,
2004,
NIPS.
[3]
Vincent Conitzer,et al.
AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
,
2003,
Machine Learning.
[4]
Martin Zinkevich,et al.
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
,
2003,
ICML.
[5]
Manuela M. Veloso,et al.
Multiagent learning using a variable learning rate
,
2002,
Artif. Intell..