The Two Facets of the Exploration-Exploitation Dilemma
暂无分享,去创建一个
Wei Pan | Kaifu Zhang | Kaifu Zhang | Wei Pan
[1] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.
[2] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[3] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[4] Peter Stone,et al. Bayesian Models of Nonstationary Markov Decision Processes , 2005 .
[5] Stewart W. Wilson,et al. From Animals to Animats 5. Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior , 1997 .
[6] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[7] Yoav Shoham,et al. Multi-Agent Reinforcement Learning:a critical survey , 2003 .
[8] Terrence J. Sejnowski,et al. Exploration Bonuses and Dual Control , 1996, Machine Learning.
[9] Shenghuo Zhu,et al. Overcoming Non-Stationarity in Uncommunicative Learning , 2002 .
[10] Jörg Denzinger,et al. Improving modeling of other agents using tentative stereotypes and compactification of observations , 2004, Proceedings. IEEE/WIC/ACM International Conference on Intelligent Agent Technology, 2004. (IAT 2004)..
[11] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .
[12] Kaifu Zhang,et al. Learn to Coordinate with Generic Non-Stationary Opponents , 2006, 2006 5th IEEE International Conference on Cognitive Informatics.
[13] Zhang Kaifu,et al. Learn to Coordinate with Generic Non-Stationary Opponents , 2006 .