论文信息 - Addressing the policy-bias of q-learning by repeating updates

Addressing the policy-bias of q-learning by repeating updates

Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal policies in Markov decision processes. However, Q-learning shows artifacts if the optimal action is played with a low probability, a situation that may arise due to wrong initialization of Q-values or due to convergence to an almost pure policy after which a change in the environment makes another action optimal. These artifacts was resolved in literature by the variant Frequency Adjusted Q-learning (FAQL). However, FAQL also suffered from practical concerns that limited the policy subspace for which the behavior was improved. Here, we introduce the Repeated Update Q-learning (RUQL), a variant of Q-learning that resolves the undesirable artifacts of Q-learning without the practical concerns of FAQL. We show (both theoretically and experimentally) the similarities and differences between RUQL and FAQL (the closest state-of-the-art). Experimental results verify the theoretical insights and show how RUQL outperforms FAQL and QL in non-stationary environments.

Sherief Abdallah | Michael Kaisers | M. Kaisers | S. Abdallah

[1] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2] Sandip Sen,et al. Emergence of Norms through Social Learning , 2007, IJCAI.

[3] Toshiharu Sugawara,et al. Emergence and Stability of Social Conventions in Conflict Situations , 2011, IJCAI.

[4] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[5] Michael H. Bowling,et al. Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[6] Michael L. Littman,et al. Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.

[7] Karl Tuyls,et al. Frequency adjusted multi-agent Q-learning , 2010, AAMAS.

[8] Victor R. Lesser,et al. A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics , 2008, J. Artif. Intell. Res..

[9] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[10] Victor R. Lesser,et al. Multi-Agent Learning with Policy Prediction , 2010, AAAI.