Empirical and Theoretical Support for Lenient Learning (Extended Abstract)

Recently, an evolutionary model of Lenient Q-learning (LQ) hasbeen proposed, providing theoretical guarantees of conver gence tothe global optimum in cooperative multi-agent learning. Ho wever,experiments reveal discrepancies between the predicted dy nam-ics of the evolutionary model and the actual learning behavi or ofthe Lenient Q-learning algorithm, which undermines its the oreti-cal foundation. Moreover it turns out that the predicted beh aviorof the model is more desirable than the observed behavior of t healgorithm. We propose the variant Lenient Frequency Adjust ed Q-learning (LFAQ) which inherits the theoretical guarantees and re-solves this issue.The advantages of LFAQ are demonstrated by comparing theevolutionary dynamics of lenient vs non-lenient Frequency Ad-justed Q-learning. In addition, we analyze the behavior, co nver-gence properties and performance of these two learning algo rithmsempirically. The algorithms are evaluated inthe Battle of t he Sexes(BoS) and the Stag Hunt (SH), while compensating for intrins iclearning speed differences. Signicant deviations arise f rom the in-troduction of leniency, leading to profound performance ga ins incoordination games against both lenient and non-lenient le arners.