Combining Hindsight with Goal-enhanced Prediction for Multi-goal Reinforcement Learning

In multi-goal reinforcement learning (RL), efficient learning from sparse rewards remains a major challenge. One of the most successful solutions to the challenge is Hindsight Experience Replay (HER), a model-free algorithm that relabels desired goals with achieved goals. However, HER and its previous variants are still limited in efficiency and require millions of samples for training. In this paper, leveraging the power of a learned dynamics model, we propose Hindsight Experience Replay with Model-based Prediction (HERO) to further improve the sample efficiency of HER. The core technique of HERO is a two-stage value estimation algorithm combining hindsight relabeled rewards and model-based predictive rewards. To effectively model complex dynamics of robot manipulation tasks, we introduce the goal-enhanced predictive model (GPM) and the achieved-goal variance prioritization (AVP). GPM pays more attention to predicting the achieved goal in the next state, while AVP prioritizes trajectories based on the variance of achieved goals in each trajectory. In our experiments, we evaluate HERO on a set of challenging robot manipulation tasks. Empirical results demonstrate that HERO achieves significantly higher sample efficiency than previous multi-goal RL algorithms.