IRU-Net: An Efficient End-to-End Network for Automatic Building Extraction From Remote Sensing Images

Automatic extraction of buildings from High-Resolution Remote Sensing (RS) Imagery is of great practical interest for numerous applications; including urban planning, change detection, disaster management, estimation of human population, and many other geospatial related applications. This paper proposes a novel efficient Improved ResU-Net architecture called IRU-Net, integrating spatial pyramid pooling module with an encoder-decoder structure, in combination with Atrous convolutions, modified residual connections, and a new skip connection between the encoder-decoder features for automatic extraction of buildings from RS images. Moreover, a new dual loss function called binary cross-entropy-dice-loss (BCEDL) is opted that make cross-entropy (CE) and dice loss (DL) and consider both local information and global information to decrease the class imbalance influence and improve the building extraction results. The proposed model is examined to demonstrate its generalization on two publicly available datasets; the Aerial Images for Roof Segmentation (AIRS) Dataset and the Massachusetts buildings dataset. The proposed IRU-Net achieved an average F-1 accuracy of 92.34% for the Massachusetts dataset and 95.65% for the AIRS dataset. When compared to other state-of-the-art deep learning-based models such as SegNet, U-Net, E-Net, ERFNet and SRI-Net, the overall accuracy improvements of our IRU-Net model is 9.0% (0.9725 vs. 0.8842), 5.2% (0.9725 vs. 0.9218), 3.0% (0.9725 vs. 0.9428), 1.4% (0.9725 vs. 0.9588) and 0.93% (0.9725 vs. 0.9635), for AIRS dataset and 11.6%, 5.9%, 3.1%, 2.7% and 1.4%, for Massachusetts building dataset. These results prove the superiority of the proposed model for building extraction from high-resolution RS images.