As a new earth observation tool, satellite video has been widely used in remote-sensing field for dynamic analysis. Video super-resolution (VSR) technique has thus attracted increasing attention due to its improvement to spatial resolution of satellite video. However, the difficulty of remote-sensing image alignment and the low efficiency of spatial–temporal information fusion make poor generalization of the conventional VSR methods applied to satellite videos. In this article, a novel fusion strategy of temporal grouping projection and an accurate alignment module are proposed for satellite VSR. First, we propose a deformable convolution alignment module with a multiscale residual block to alleviate the alignment difficulties caused by scarce motion and various scales of moving objects in remote-sensing images. Second, a temporal grouping projection fusion strategy is proposed, which can reduce the complexity of projection and make the spatial features of reference frames play a continuous guiding role in spatial–temporal information fusion. Finally, a temporal attention module is designed to adaptively learn the different contributions of temporal information extracted from each group. Extensive experiments on Jilin-1 satellite video demonstrate that our method is superior to current state-of-the-art VSR methods.