Use All Your Skills, Not Only The Most Popular Ones

Reinforcement Learning (RL) has shown promising results across various domains. However, applying it to develop gameplaying agents is challenging due to sparsity of extrinsic rewards, where agents get rewards from the environments only at the end of game levels. Previous works have shown that using intrinsic rewards is an effective way to deal with such cases. Intrinsic rewards allow to incorporate basic skills in agent policies to better generalize over various game levels. In a gameplay, it is common that certain actions (skills) are observed more often than others, which leads to a biased selection of actions. This problem boils down to a normalization issue in formulating the skill-based reward function. In this paper, we propose a novel solution to this problem by taking into account the frequency of all skills in the reward function. We show that our method improves the performance of agents by enabling them to select effective skills up to 2.5 times more frequently than that of the state-of-the-art in the context of the match-3 game Candy Crush Friends Saga.