MSTDSNet-CD: Multiscale Swin Transformer and Deeply Supervised Network for Change Detection of the Fast-Growing Urban Regions

Deep learning algorithms have recently provided new ideas for various change detection (CD) tasks, which have yielded promising results. However, accurately identifying urban land cover and land use (LCLU) changes remains challenging in the very high-resolution (HR) remote sensing images due to the difficulties in effectively modeling the features from ground objects with different times and different spatial locations. In this letter, a multiscale swin transformer (MST)-based deeply supervised network (MSTDSNet) is proposed for monitoring urban LCLU changes using bi-temporal very HR remote sensing images. The wider and deeper layer aggregation (WDLA) is first introduced to improve the distinguishability of multiscale features. Subsequently, MST is adopted to make the most of the available spatial information in the refined multiscale features. Moreover, channel-wise dependencies (CWDS) are integrated as a soft constraint to directly supervise WDLA. Experiments are conducted on the widely used SYSU-CD and LEVIR-CD datasets. Compared with other state-of-the-art CD methods, MSTDSNet provides favorable performance, with the highest F1 of 80.33% and 88.10%, respectively.