论文信息 - The Two Facets of the Exploration-Exploitation Dilemma

The Two Facets of the Exploration-Exploitation Dilemma

This paper proposes an algorithm to better solve the exploration-exploitation dilemma faced by model-less reinforcement learning agents. The main contribution is twofold: (1) The two facets of the exploration-exploitation dilemma are distinguished: in some cases, the agent faces a non-stationary environment, therefore it needs to choose the best moment to explore in order to adapt to the changes; in some other cases, the agent faces a relatively large state-action space, and it therefore needs to choose the most promising subset of states/actions to explore. In this two-facet framework, we compared the relative advantage and limitations of two previously proposed algorithms in difference situations. (2) We unified these two algorithms to produce the new algorithm which works fairly well in all testing situations.

Wei Pan | Kaifu Zhang | Kaifu Zhang | Wei Pan

[1] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[2] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[3] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[4] Peter Stone,et al. Bayesian Models of Nonstationary Markov Decision Processes , 2005 .

[5] Stewart W. Wilson,et al. From Animals to Animats 5. Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior , 1997 .

[6] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[7] Yoav Shoham,et al. Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[8] Terrence J. Sejnowski,et al. Exploration Bonuses and Dual Control , 1996, Machine Learning.

[9] Shenghuo Zhu,et al. Overcoming Non-Stationarity in Uncommunicative Learning , 2002 .

[10] Jörg Denzinger,et al. Improving modeling of other agents using tentative stereotypes and compactification of observations , 2004, Proceedings. IEEE/WIC/ACM International Conference on Intelligent Agent Technology, 2004. (IAT 2004)..

[11] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .

[12] Kaifu Zhang,et al. Learn to Coordinate with Generic Non-Stationary Opponents , 2006, 2006 5th IEEE International Conference on Cognitive Informatics.

[13] Zhang Kaifu,et al. Learn to Coordinate with Generic Non-Stationary Opponents , 2006 .