论文信息 - Empirical and Theoretical Support for Lenient Learning (Extended Abstract)

Empirical and Theoretical Support for Lenient Learning (Extended Abstract)

Recently, an evolutionary model of Lenient Q-learning (LQ) hasbeen proposed, providing theoretical guarantees of conver gence tothe global optimum in cooperative multi-agent learning. Ho wever,experiments reveal discrepancies between the predicted dy nam-ics of the evolutionary model and the actual learning behavi or ofthe Lenient Q-learning algorithm, which undermines its the oreti-cal foundation. Moreover it turns out that the predicted beh aviorof the model is more desirable than the observed behavior of t healgorithm. We propose the variant Lenient Frequency Adjust ed Q-learning (LFAQ) which inherits the theoretical guarantees and re-solves this issue.The advantages of LFAQ are demonstrated by comparing theevolutionary dynamics of lenient vs non-lenient Frequency Ad-justed Q-learning. In addition, we analyze the behavior, co nver-gence properties and performance of these two learning algo rithmsempirically. The algorithms are evaluated inthe Battle of t he Sexes(BoS) and the Stag Hunt (SH), while compensating for intrins iclearning speed differences. Signicant deviations arise f rom the in-troduction of leniency, leading to profound performance ga ins incoordination games against both lenient and non-lenient le arners.

Karl Tuyls | Michael Kaisers | Daan Bloembergen

[1] Tom Lenaerts,et al. A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.

[2] Karl Tuyls,et al. Frequency adjusted multi-agent Q-learning , 2010, AAMAS.

[3] Karl Tuyls,et al. A Comparative Study of Multi-agent Reinforcement Learning Dynamics , 2010 .

[4] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[5] K. Tuyls,et al. Lenient Frequency Adjusted Q-learning , 2010 .

[6] Karl Tuyls,et al. Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective , 2008, J. Mach. Learn. Res..