Spatio-Temporal Dual-Branch Network With Predictive Feature Learning for Satellite Video Object Segmentation

Satellite video is an important new earth observation data source that can be used to acquire large-scale dynamic information. Satellite video object segmentation (SVOS) is aimed at separating the foreground and background of satellite video and is a fundamental processing task for satellite video. To date, the state-of-the-art research into SVOS has mainly focused on unsupervised target extraction methods through hand-crafted features and post-processing operations, which are prone to obtaining incomplete contours of the targets and result in foreground aperture problem. Furthermore, the small size of targets and appearance deformation make the SVOS more difficult. Therefore, in this article, a spatio-temporal dual-branch network is proposed with predictive feature learning for the SVOS task. The proposed model consists of a temporal coherence branch and a spatial segmentation branch. In the temporal coherence branch, the Wasserstein generative adversarial network (WGAN) architecture is utilized for the future frame prediction to exploit temporal information, which captures the dynamic appearance and motion cues from the unlabeled satellite video data through predictive feature learning module in an adversarial manner. As a result, the proposed method can obtain segmentation results with temporal consistency, while avoiding the generation of optical flow images. In the spatial segmentation branch, a fully convolutional network (FCN) is used to extract the high-level spatial information of the satellite video and achieve end-to-end SVOS, without any post-processing operations. In the network implementation, boundary loss is used to solve the highly unbalanced segmentation problem caused by small size of the targets. The two branches of the network are also mutually constrained, to improve the final object segmentation results. The visual and quantitative results of three experiments all demonstrate that the proposed method outperforms the other current SVOS models.