Recently, an evolutionary model of Lenient Q-learning (LQ) hasbeen proposed, providing theoretical guarantees of conver gence tothe global optimum in cooperative multi-agent learning. Ho wever,experiments reveal discrepancies between the predicted dy nam-ics of the evolutionary model and the actual learning behavi or ofthe Lenient Q-learning algorithm, which undermines its the oreti-cal foundation. Moreover it turns out that the predicted beh aviorof the model is more desirable than the observed behavior of t healgorithm. We propose the variant Lenient Frequency Adjust ed Q-learning (LFAQ) which inherits the theoretical guarantees and re-solves this issue.The advantages of LFAQ are demonstrated by comparing theevolutionary dynamics of lenient vs non-lenient Frequency Ad-justed Q-learning. In addition, we analyze the behavior, co nver-gence properties and performance of these two learning algo rithmsempirically. The algorithms are evaluated inthe Battle of t he Sexes(BoS) and the Stag Hunt (SH), while compensating for intrins iclearning speed differences. Signicant deviations arise f rom the in-troduction of leniency, leading to profound performance ga ins incoordination games against both lenient and non-lenient le arners.
[1]
Tom Lenaerts,et al.
A selection-mutation model for q-learning in multi-agent systems
,
2003,
AAMAS '03.
[2]
Karl Tuyls,et al.
Frequency adjusted multi-agent Q-learning
,
2010,
AAMAS.
[3]
Karl Tuyls,et al.
A Comparative Study of Multi-agent Reinforcement Learning Dynamics
,
2010
.
[4]
Peter Dayan,et al.
Q-learning
,
1992,
Machine Learning.
[5]
K. Tuyls,et al.
Lenient Frequency Adjusted Q-learning
,
2010
.
[6]
Karl Tuyls,et al.
Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective
,
2008,
J. Mach. Learn. Res..