RESIDUAL SHUFFLING CONVOLUTIONAL NEURAL NETWORKS FOR DEEP SEMANTIC IMAGE SEGMENTATION USING MULTI-MODAL DATA

In this paper, we address the deep semantic segmentation of aerial imagery based on multi-modal data. Given multi-modal data composed of true orthophotos and the corresponding Digital Surface Models (DSMs), we extract a variety of hand-crafted radiometric and geometric features which are provided separately and in different combinations as input to a modern deep learning framework. The latter is represented by a Residual Shuffling Convolutional Neural Network (RSCNN) combining the characteristics of a Residual Network with the advantages of atrous convolution and a shuffling operator to achieve a dense semantic labeling. Via performance evaluation on a benchmark dataset, we analyze the value of different feature sets for the semantic segmentation task. The derived results reveal that the use of radiometric features yields better classification results than the use of geometric features for the considered dataset. Furthermore, the consideration of data on both modalities leads to an improvement of the classification results. However, the derived results also indicate that the use of all defined features is less favorable than the use of selected features. Consequently, data representations derived via feature extraction and feature selection techniques still provide a gain if used as the basis for deep semantic segmentation.

[1]  Menglong Yan,et al.  Building extraction from remote sensing images with deep learning in a supervised manner , 2017, 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Markus Gerke,et al.  The ISPRS benchmark on urban object classification and 3D building reconstruction , 2012 .

[5]  Arnold W. M. Smeulders,et al.  Color-based object recognition , 1997, Pattern Recognit..

[6]  Bertrand Le Saux,et al.  Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks , 2016, ACCV.

[7]  Menglong Yan,et al.  Semantic Segmentation of Aerial Images With Shuffling Convolutional Neural Networks , 2018, IEEE Geoscience and Remote Sensing Letters.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  J. Demantké,et al.  DIMENSIONALITY BASED SCALE SELECTION IN 3D LIDAR POINT CLOUDS , 2012 .

[10]  Michael Weinmann,et al.  Geospatial Computer Vision Based on Multi-Modal Data - How Valuable Is Shape Information for the Extraction of Semantic Information? , 2017, Remote. Sens..

[11]  Uwe Stilla,et al.  Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection , 2016, ISPRS Journal of Photogrammetry and Remote Sensing.

[12]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[13]  Michael Cramer,et al.  The DGPF-Test on Digital Airborne Camera Evaluation - Over- view and Test Design , 2010 .

[14]  B. Jutzi,et al.  NEAREST NEIGHBOUR CLASSIFICATION ON LASER POINT CLOUDS TO GAIN OBJECT STRUCTURES FROM BUILDINGS , 2009 .

[15]  Markus Gerke,et al.  Use of the stair vision library within the ISPRS 2D semantic labeling benchmark (Vaihingen) , 2014 .

[16]  Jamie Sherrah,et al.  Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial Imagery , 2016, ArXiv.

[17]  Konrad Schindler,et al.  FAST SEMANTIC SEGMENTATION OF 3D POINT CLOUDS WITH STRONGLY VARYING DENSITY , 2016 .

[18]  James R. Lersch,et al.  Context-driven automated target detection in 3D data , 2004, SPIE Defense + Commercial Sensing.

[19]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[20]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  J. A. Schell,et al.  Monitoring vegetation systems in the great plains with ERTS , 1973 .

[23]  A. Gitelson,et al.  Remote sensing of chlorophyll concentration in higher plant leaves , 1998 .

[24]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Piotr Tokarczyk,et al.  Features, Color Spaces, and Boosting: New Insights on Semantic Classification of Remote Sensing Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[26]  Martin Weinmann,et al.  USING MULTI-SCALE FEATURES FOR THE 3D SEMANTIC LABELING OFAIRBORNE LASER SCANNING DATA , 2017 .

[27]  Bertrand Le Saux,et al.  Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[28]  Markus H. Gross,et al.  Multi‐scale Feature Extraction on Point‐Sampled Surfaces , 2003, Comput. Graph. Forum.

[29]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[30]  Sildomar T. Monteiro,et al.  Dense Semantic Labeling of Very-High-Resolution Aerial Imagery and LiDAR with Fully-Convolutional Neural Networks and Higher-Order CRFs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[32]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[33]  Michele Volpi,et al.  Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[34]  Stefan Hinz,et al.  CONTEXTUAL CLASSIFICATION OF POINT CLOUD DATA BY EXPLOITING INDIVIDUAL 3D NEIGBOURHOODS , 2015 .

[35]  Martin Weinmann,et al.  Book Review–Reconstruction and Analysis of 3D Scenes: From Irregularly Distributed 3D Points to Object Classes , 2016, Photogrammetric Engineering & Remote Sensing.

[36]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Jing Xiao,et al.  Fusion of airborne laserscanning point clouds and images for supervised and unsupervised scene classification , 2014 .

[38]  J. D. Wegner,et al.  SEMANTIC SEGMENTATION OF AERIAL IMAGES WITH AN ENSEMBLE OF CNNS , 2016, ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences.

[39]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.