Addressing the policy-bias of q-learning by repeating updates
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[2] Sandip Sen,et al. Emergence of Norms through Social Learning , 2007, IJCAI.
[3] Toshiharu Sugawara,et al. Emergence and Stability of Social Conventions in Conflict Situations , 2011, IJCAI.
[4] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[5] Michael H. Bowling,et al. Convergence and No-Regret in Multiagent Learning , 2004, NIPS.
[6] Michael L. Littman,et al. Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.
[7] Karl Tuyls,et al. Frequency adjusted multi-agent Q-learning , 2010, AAMAS.
[8] Victor R. Lesser,et al. A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics , 2008, J. Artif. Intell. Res..
[9] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.
[10] Victor R. Lesser,et al. Multi-Agent Learning with Policy Prediction , 2010, AAAI.