UVid-Net: Enhanced Semantic Segmentation of UAV Aerial Videos by Embedding Temporal Information

Semantic segmentation of aerial videos has been extensively used for decision making in monitoring environmental changes, urban planning, and disaster management. The reliability of these decision support systems is dependent on the accuracy of the video semantic segmentation algorithms. The existing CNN based video semantic segmentation methods have enhanced the image semantic segmentation methods by incorporating an additional module such as LSTM or optical flow for computing temporal dynamics of the video which is a computational overhead. The proposed research work modifies the CNN architecture by incorporating temporal information to improve the efficiency of video semantic segmentation. In this work, an enhanced encoder-decoder based CNN architecture (UVid-Net) is proposed for UAV video semantic segmentation. The encoder of the proposed architecture embeds temporal information for temporally consistent labelling. The decoder is enhanced by introducing the feature retainer module, which aids in the accurate localization of the class labels. The proposed UVid-Net architecture for UAV video semantic segmentation is quantitatively evaluated on an extended ManipalUAVid dataset. The performance metric mIoU of 0.79 has been observed which is significantly greater than the other state-of-the-art algorithms. Further, the proposed work produced promising results even for the pre-trained model of UVid-Net on the urban street scene with fine-tuning the final layer on UAV aerial videos.

[1]  M. Hebert,et al.  Efficient temporal consistency for streaming video scene analysis , 2013, 2013 IEEE International Conference on Robotics and Automation.

[2]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[3]  Yunchao Wei,et al.  CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Shiguang Shan,et al.  Deep Video Code for Efficient Face Video Retrieval , 2016, ACCV.

[5]  Luc Van Gool,et al.  Efficient Video Semantic Segmentation with Labels Propagation and Refinement , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Radhika M. Pai,et al.  Semantic Segmentation of UAV Aerial Videos using Convolutional Neural Networks , 2019, 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE).

[7]  Xin Wang,et al.  Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Benjamin M. Marlin,et al.  The Shape-Time Random Field for Semantic Video Labeling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Alexandre Boulch,et al.  Guided Anisotropic Diffusion and Iterative Learning for Weakly Supervised Change Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Christopher Kanan,et al.  AeroRIT: A New Scene for Hyperspectral Image Analysis , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Limin Wang,et al.  Appearance-and-Relation Networks for Video Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Wei Liu,et al.  CNN in MRF: Video Object Segmentation via Inference in a CNN-Based Higher-Order Spatio-Temporal MRF , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[15]  Evangeline Corcoran,et al.  Automated detection of koalas using low-level aerial surveillance and machine learning , 2019, Scientific Reports.

[16]  Narendra Ahuja,et al.  Exploiting nonlocal spatiotemporal structure for video segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Bahareh Kalantar,et al.  Multiple Moving Object Detection From UAV Videos Using Trajectories of Matched Regional Adjacency Graphs , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[18]  Manohara Pai,et al.  Automatic Segmentation of River and Land in SAR Images: A Deep Learning Approach , 2019, 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE).

[19]  Sally McClean,et al.  Unmanned Aerial Vehicles for Disaster Management , 2018, Springer Natural Hazards.

[20]  Isabelle Bloch,et al.  Segmentation and size estimation of tomatoes from sequences of paired images , 2015, EURASIP J. Image Video Process..

[21]  R. Almar,et al.  On the operational use of UAVs for video-derived bathymetry , 2019, Coastal Engineering.

[22]  T. Sakamoto,et al.  Surveillance of panicle positions by unmanned aerial vehicle to reveal morphological features of rice , 2019, PloS one.

[23]  Wei Xu,et al.  Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Alan Fern,et al.  Budget-Aware Deep Semantic Video Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[26]  C. Heipke,et al.  Context-based urban terrain reconstruction from uav-videos for geoinformation applications , 2012 .

[27]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Jason J. Corso,et al.  Temporally consistent multi-class video-object segmentation with the Video Graph-Shifts algorithm , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[29]  Ming Wu,et al.  D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[30]  Vladlen Koltun,et al.  Feature Space Optimization for Semantic Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jianjiang Lu,et al.  A Framework for Moving Target Detection, Recognition and Tracking in UAV Videos , 2012 .

[32]  Yaser Sheikh,et al.  Recycle-GAN: Unsupervised Video Retargeting , 2018, ECCV.

[33]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Mahmood Fathy,et al.  STFCN: Spatio-Temporal Fully Convolutional Neural Network for Semantic Segmentation of Street Scenes , 2016, ACCV Workshops.

[35]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Radhika M. Pai,et al.  Performance Analysis of Semantic Segmentation Algorithms for Finely Annotated New UAV Aerial Video Dataset (ManipalUAVid) , 2019, IEEE Access.

[37]  Zhuang Miao,et al.  An Improved Object Tracking Method in UAV Videos , 2011 .

[38]  Devis Tuia,et al.  When a Few Clicks Make All the Difference: Improving Weakly-Supervised Wildlife Detection in UAV Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39]  Jian Dong,et al.  Video Scene Parsing with Predictive Feature Learning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Xinwu Li,et al.  PolSAR Image Semantic Segmentation Based on Deep Transfer Learning—Realizing Smooth Classification With Small Training Sets , 2019, IEEE Geoscience and Remote Sensing Letters.

[41]  Baoxin Li,et al.  A survey of variational and CNN-based optical flow techniques , 2019, Signal Process. Image Commun..

[42]  Jun Fu,et al.  Attention-Guided Network for Semantic Video Segmentation , 2019, IEEE Access.

[43]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[44]  David A. Clausi,et al.  Unsupervised Polarimetric SAR Image Segmentation and Classification Using Region Growing With Edge Penalty , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[45]  Hujun Yin,et al.  Post Disaster Mapping With Semantic Change Detection in Satellite Imagery , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[46]  Bernt Schiele,et al.  A Dynamic Conditional Random Field Model for Joint Labeling of Object and Scene Classes , 2008, ECCV.

[47]  Shawn D. Newsam,et al.  Improving Semantic Segmentation via Video Propagation and Label Relaxation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Concetto Spampinato,et al.  Semi Supervised Semantic Segmentation Using Generative Adversarial Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Cordelia Schmid,et al.  Multi-region Two-Stream R-CNN for Action Detection , 2016, ECCV.

[50]  Marcel van Gerven,et al.  Improving semantic video segmentation by dynamic scene integration , 2016 .

[51]  Michael Ying Yang,et al.  Deep Learning for Semantic Segmentation of UAV Videos , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[52]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.