State Representation Learning With Adjacent State Consistency Loss for Deep Reinforcement Learning

Through well-designed optimization paradigm and deep neural networks as feature extractor, deep reinforcement learning (DRL) algorithms learn optimal policy on discrete and continuous action space. However, such capability is restricted by the low sampling efficiency. By inspecting the importance of feature extraction in DRL, we find that state feature learning is one of the key obstacles for sampling efficiently. To this end, we propose a new state representation learning scheme with adjacent state consistency loss (ASC loss). The loss is based on the hypothesis that the distance between adjacent states is smaller than that of far apart ones since scenes in videos generally evolve smoothly. We exploit ASC loss as an assistant of RL loss in the training phase to boost the state feature learning, and make evaluation on existing DRL algorithms as well as behavioral cloning algorithm. Experiments on Atari games and MuJoCo continuous control tasks demonstrate the effectiveness of our scheme.