Improving Maps from CNNs Trained with Sparse, Scribbled Ground Truths Using Fully Connected CRFs

Convolutional Neural Networks (CNNs) have become the new standard for semantic segmentation of very high resolution images. But as for other methods, the map accuracy depends on the quantity and quality of ground truth used to train them. Having densely annotated data, i.e. a detailed, pixel-level ground truth (GT), allows obtaining effective models, but requires high efforts in annotation. For this reason, it is more common and efficient to work with point or scribbled annotations rather than with dense ones. A CNN model trained with such incomplete ground truths tends to mischaracterize the shapes of the objects and to be inaccurate near their boundaries. We propose to use an approximation of a fully connected Conditional Random Field (CRF) to solve these issues, in which long range connections are accounted for through auxiliary nodes based on clustering of CNN activation features. Experiments on the ISPRS Vaihingen benchmark, where a CNN is trained only with a non-dense, scribbled ground truth, show that the proposed method can fill part of the performance gap with respect to models trained on the densely annotated, but unrealistic, ground truth.

[1]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[2]  Gabriele Moser,et al.  Decision Fusion With Multiple Spatial Supports by Conditional Random Fields , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[3]  Michele Volpi,et al.  Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Uwe Stilla,et al.  Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection , 2016, ISPRS Journal of Photogrammetry and Remote Sensing.

[5]  Konrad Schindler,et al.  An Overview and Comparison of Smooth Labeling Methods for Land-Cover Classification , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[6]  Xiao Xiang Zhu,et al.  Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources , 2017, IEEE Geoscience and Remote Sensing Magazine.

[7]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Pierre Alliez,et al.  Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ian D. Reid,et al.  Weakly Supervised Semantic Segmentation Based on Co-segmentation , 2017, BMVC.

[11]  Michele Volpi,et al.  Land cover mapping at very high resolution with rotation equivariant CNNs: towards small yet accurate models , 2018, ISPRS Journal of Photogrammetry and Remote Sensing.

[12]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).