Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks

This work investigates the use of deep fully convolutional neural networks (DFCNN) for pixel-wise scene labeling of Earth Observation images. Especially, we train a variant of the SegNet architecture on remote sensing data over an urban area and study different strategies for performing accurate semantic segmentation. Our contributions are the following: (1) we transfer efficiently a DFCNN from generic everyday images to remote sensing images; (2) we introduce a multi-kernel convolutional layer for fast aggregation of predictions at multiple scales; (3) we perform data fusion from heterogeneous sensors (optical and laser) using residual correction. Our framework improves state-of-the-art accuracy on the ISPRS Vaihingen 2D Semantic Labeling dataset.

[1]  Markus Gerke,et al.  The ISPRS benchmark on urban object classification and 3D building reconstruction , 2012 .

[2]  Michael Cramer,et al.  The DGPF-Test on Digital Airborne Camera Evaluation - Over- view and Test Design , 2010 .

[3]  Gustavo Carneiro,et al.  Competitive Multi-scale Convolution , 2015, ArXiv.

[4]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[5]  Uwe Stilla,et al.  SEMANTIC SEGMENTATION OF AERIAL IMAGES WITH AN ENSEMBLE OF CNNS , 2016 .

[6]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[7]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[9]  Shihong Du,et al.  Learning multiscale and deep representations for classifying remotely sensed imagery , 2016 .

[10]  Renjie Liao,et al.  Video Super-Resolution via Deep Draft-Ensemble Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[12]  J. D. Wegner,et al.  SEMANTIC SEGMENTATION OF AERIAL IMAGES WITH AN ENSEMBLE OF CNNS , 2016, ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences.

[13]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[14]  Jefersson Alex dos Santos,et al.  Towards better exploiting convolutional neural networks for remote sensing scene classification , 2016, Pattern Recognit..

[15]  Alexandre Boulch,et al.  Processing of Extremely High-Resolution LiDAR and RGB Data: Outcome of the 2015 IEEE GRSS Data Fusion Contest–Part A: 2-D Contest , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[16]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Jefersson Alex dos Santos,et al.  Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18]  Jamie Sherrah,et al.  Effective semantic pixel labelling with convolutional networks and Conditional Random Fields , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Philip H. S. Torr,et al.  Higher Order Conditional Random Fields in Deep Neural Networks , 2015, ECCV.

[20]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Alexandre Boulch,et al.  Benchmarking classification of earth-observation data: From learning explicit features to convolutional networks , 2015, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[22]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[25]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[27]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[28]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Anton van den Hengel,et al.  High-performance Semantic Segmentation Using Very Deep Fully Convolutional Networks , 2016, ArXiv.

[31]  Geoffrey E. Hinton,et al.  Learning to Detect Roads in High-Resolution Aerial Images , 2010, ECCV.

[32]  Huynh Thi Thanh Binh,et al.  An Efficient Framework for Pixel-wise Building Segmentation from Aerial Images , 2015, SoICT.

[33]  Uwe Stilla,et al.  Deep Learning Earth Observation Classification Using ImageNet Pretrained Networks , 2016, IEEE Geoscience and Remote Sensing Letters.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yann LeCun,et al.  Stacked What-Where Auto-encoders , 2015, ArXiv.

[36]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37]  Yizhou Yu,et al.  Combining the Best of Convolutional Layers and Recurrent Layers: A Hybrid Network for Semantic Segmentation , 2016, ArXiv.

[38]  Markus Gerke,et al.  Use of the stair vision library within the ISPRS 2D semantic labeling benchmark (Vaihingen) , 2014 .

[39]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.