Towards Automatic Annotation for Semantic Segmentation in Drone Videos

Semantic segmentation is a crucial task for robot navigation and safety. However, it requires huge amounts of pixelwise annotations to yield accurate results. While recent progress in computer vision algorithms has been heavily boosted by large ground-level datasets, the labeling time has hampered progress in low altitude UAV applications, mostly due to the difficulty imposed by large object scales and pose variations. Motivated by the lack of a large video aerial dataset, we introduce a new one, with high resolution (4K) images and manually-annotated dense labels every 50 frames. To help the video labeling process, we make an important step towards automatic annotation and propose SegProp, an iterative flow-based method with geometric constrains to propagate the semantic labels to frames that lack human annotations. This results in a dataset with more than 50k annotated frames - the largest of its kind, to the best of our knowledge. Our experiments show that SegProp surpasses current state-of-the-art label propagation methods by a significant margin. Furthermore, when training a semantic segmentation deep neural net using the automatically annotated frames, we obtain a compelling overall performance boost at test time of 16.8% mean F-measure over a baseline trained only with manually-labeled frames. Our Ruralscapes dataset, the label propagation code and a fast segmentation tool are available at our website: this https URL

[1]  Martial Hebert,et al.  An Integer Projected Fixed Point Method for Graph Matching and MAP Inference , 2009, NIPS.

[2]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Stan Z. Li,et al.  Markov Random Field Models in Computer Vision , 1994, ECCV.

[4]  Ruigang Yang,et al.  The ApolloScape Dataset for Autonomous Driving , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[6]  Emil Slusanschi,et al.  SafeUAV: Learning to Estimate Depth and Safe Landing Areas for UAVs from Synthetic Data , 2018, ECCV Workshops.

[7]  Michael Ying Yang,et al.  UAVid: A semantic segmentation dataset for UAV imagery , 2018 .

[8]  Chen Huang,et al.  Ensemble Knowledge Transfer for Semantic Segmentation , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Ignas Budvytis,et al.  Large Scale Labelled Video Data Augmentation for Semantic Segmentation in Driving Scenarios , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[10]  Shawn D. Newsam,et al.  Improving Semantic Segmentation via Video Propagation and Label Relaxation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ignas Budvytis,et al.  Label propagation in complex video sequences using semi-supervised learning , 2010, BMVC.

[12]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[13]  Marc Van Droogenbroeck,et al.  Mid-Air: A Multi-Modal Dataset for Extremely Low Altitude Drone Flights , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Vladlen Koltun,et al.  Playing for Benchmarks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Michael Ying Yang,et al.  The UAVid Dataset for Video Semantic Segmentation , 2018, ArXiv.