DSFNet: Dynamic and Static Fusion Network for Moving Object Detection in Satellite Videos

Moving object detection (MOD) in satellite videos remains challenging due to the extremely small size of the interested targets and the highly complex background. Both the intra-frame (static) and inter-frame (dynamic) information are of great importance to MOD. In this letter, we propose a two-stream detection network named dynamic and static fusion network (DSFNet) to tackle the MOD problem in satellite videos. Specifically, the DSFNet is composed of a 2-D backbone to extract static context information from a single frame and a lightweight 3-D backbone to extract dynamic motion cues from consecutive frames. Then the extracted static and dynamic features are fused and fed into the detection head to detect the moving targets in satellite videos. We conduct extensive experiments on videos collected from Jilin-1 satellite and the results have demonstrated the effectiveness and robustness of the proposed DSFNet. Experimental results show that our DSFNet achieves the-state-of-the-art performance.