Self-Attention-Based Temporary Curiosity in Reinforcement Learning Exploration

In many real-world scenarios, extrinsic rewards provided by the environment are sparse. An agent trained with classic reinforcement learning algorithm fails to explore these environments in a sufficient and effective way. To address this problem, the exploration bonus which derives from environmental novelty serves as intrinsic motivation for the agent. In recent years, curiosity-driven exploration is a mainstream approach to describe environmental novelty through prediction errors of dynamics models. Due to the expressive ability limitations of curiosity-based environmental novelty and the difficulty of finding appropriate feature space, most curiosity-driven exploration methods have the problem of overprotection against repetition. This problem can reduce the efficiency of exploration and lead the agent into a trap with local optimality. In this article, we propose a combination of persisting curiosity and temporary curiosity framework to deal with the problem of overprotection against repetition. We introduce the self-attention mechanism from the field of computer vision and propose a sequence-based self-attention mechanism for temporary curiosity generation. We compare our framework with some previous exploration methods in hard-exploration environments, provide a series of comprehensive analysis of the proposed framework and investigate the effect of the individual components of our method. The experimental results indicate that the proposed framework delivers superior performance than existing methods.