Deep multi-task learning for a geographically-regularized semantic segmentation of aerial images

Abstract When approaching the semantic segmentation of overhead imagery in the decimeter spatial resolution range, successful strategies usually combine powerful methods to learn the visual appearance of the semantic classes (e.g. convolutional neural networks) with strategies for spatial regularization (e.g. graphical models such as conditional random fields). In this paper, we propose a method to learn evidence in the form of semantic class likelihoods, semantic boundaries across classes and shallow-to-deep visual features, each one modeled by a multi-task convolutional neural network architecture. We combine this bottom-up information with top-down spatial regularization encoded by a conditional random field model optimizing the label space across a hierarchy of segments with constraints related to structural, spatial and data-dependent pairwise relationships between regions. Our results show that such strategy provide better regularization than a series of strong baselines reflecting state-of-the-art technologies. The proposed strategy offers a flexible and principled framework to include several sources of visual and structural information, while allowing for different degrees of spatial regularization accounting for priors about the expected output structures.

[1]  Jordi Pont-Tuset,et al.  Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[3]  Uwe Stilla,et al.  Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection , 2016, ISPRS Journal of Photogrammetry and Remote Sensing.

[4]  Gregory Shakhnarovich,et al.  Feedforward semantic segmentation with zoom-out features , 2014, CVPR.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Vladimir Kolmogorov,et al.  Optimizing Binary MRFs via Extended Roof Duality , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Alexandre Boulch,et al.  Processing of Extremely High-Resolution LiDAR and RGB Data: Outcome of the 2015 IEEE GRSS Data Fusion Contest–Part A: 2-D Contest , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[9]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Neill W. Campbell,et al.  Interpreting image databases by region classification , 1997, Pattern Recognit..

[11]  Michele Volpi,et al.  Semantic segmentation of urban scenes by learning local class interactions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Michael E. Schaepman,et al.  Determination of grassland use intensity based on multi-temporal remote sensing data and ecological indicators , 2017 .

[13]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Thomas Mauthner,et al.  Semantic Classification in Aerial Imagery by Integrating Appearance and Height Information , 2009, ACCV.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.

[17]  Michele Volpi,et al.  Land cover mapping at very high resolution with rotation equivariant CNNs: towards small yet accurate models , 2018, ISPRS Journal of Photogrammetry and Remote Sensing.

[18]  Yan Wang,et al.  DeepContour: A deep convolutional feature learned by positive-sharing loss for contour detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Deepak Khare,et al.  Monitoring and modelling of urban sprawl using remote sensing and GIS techniques , 2008, Int. J. Appl. Earth Obs. Geoinformation.

[20]  Ping Zhong,et al.  A Multiple Conditional Random Fields Ensemble Model for Urban Area Detection in Remote Sensing Optical Images , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[21]  Alexandre Boulch,et al.  Benchmarking classification of earth-observation data: From learning explicit features to convolutional networks , 2015, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[22]  Joachim Höhle,et al.  Generating Topographic Map Data from Classification Results , 2017, Remote. Sens..

[23]  Jamie Sherrah,et al.  Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial Imagery , 2016, ArXiv.

[24]  Vittorio Ferrari,et al.  Situational object boundary detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[28]  Bertrand Le Saux,et al.  Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[29]  Gabriele Moser,et al.  A New Cascade Model for the Hierarchical Joint Classification of Multitemporal and Multiresolution Remote Sensing Data , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[30]  Yun Zhang,et al.  Semantic Segmentation of Remote Sensing Imagery Using an Object-Based Markov Random Field Model With Auxiliary Label Fields , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[31]  Andrew Zisserman,et al.  Pylon Model for Semantic Segmentation , 2011, NIPS.

[32]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[33]  Iasonas Kokkinos,et al.  Pushing the Boundaries of Boundary Detection using Deep Learning , 2015, ICLR 2016.

[34]  Markus Gerke,et al.  Use of the stair vision library within the ISPRS 2D semantic labeling benchmark (Vaihingen) , 2014 .

[35]  Hassan Ghassemian,et al.  Integrating Hierarchical Segmentation Maps With MRF Prior for Classification of Hyperspectral Images in a Bayesian Framework , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[36]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[37]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[38]  Xiaopeng Zhang,et al.  Robust Rooftop Extraction From Visible Band Images Using Higher Order CRF , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[39]  Michael Ying Yang,et al.  Review of Automatic Feature Extraction from High-Resolution Optical Sensor Data for UAV-Based Cadastral Mapping , 2016, Remote. Sens..

[40]  Christian Heipke,et al.  Conditional Random Fields for Multitemporal and Multiscale Classification of Optical Satellite Imagery , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[41]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  M. Keller,et al.  Selective Logging in the Brazilian Amazon , 2005, Science.

[43]  Michele Volpi,et al.  Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[44]  Pierre Alliez,et al.  High-Resolution Aerial Image Labeling With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.