Semantic Vehicle Segmentation in Very High Resolution Multispectral Aerial Images Using Deep Neural Networks

The fusion of complementary information from co-registered multi-modal image data enables a more detailed and more robust understanding of an image scene or specific objects, and is important for several applications in the field of remote sensing. In this paper, the benefits of combining RGB, near infrared (NIR) and thermal infrared (TIR) aerial images for the task of semantic vehicle segmentation through deep neural networks are investigated. Therefore, RGB, NIR and TIR image triplets acquired by the Modular Aerial Camera System (MACS) are precisely co-registered through the application of a virtual camera system and subsequently used for the training of different neural network architectures. Various experiments were conducted to investigate the influence of the different sensor characteristics and an early or late fusion within the network on the quality of the segmentation results.