Semantic Segmentation of UAV Aerial Videos using Convolutional Neural Networks

Semantic segmentation of complex aerial videos enables a better understanding of scene and context. This enhances the performance of automated video processing techniques like anomaly detection, object detection, event detection and other applications. But, there is a limited study of semantic segmentation in aerial videos due to non-availability of the relevant dataset. To address this, an aerial video dataset is captured using DJI Phantom 3 professional drone and is annotated manually. In addition, the proposed research work investigates the performance of semantic segmentation algorithms for aerial videos implemented using Fully Convolution Networks (FCN) and U-net architectures. In this study, two classes (greenery, road) are considered for semantic segmentation. It is observed that both architectures perform competitively on the aerial videos of Unmanned Aerial Vehicle (UAV) with a pixel accuracy of 89.7% and 87.31% respectively.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[5]  Alexandre Boulch,et al.  Fully Convolutional Siamese Networks for Change Detection , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[6]  Jan Dirk Wegner,et al.  A Higher-Order CRF Model for Road Network Extraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Xuming He,et al.  An Exemplar-Based CRF for Multi-instance Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Turgay Çelik,et al.  Unsupervised Change Detection in Satellite Images Using Principal Component Analysis and $k$-Means Clustering , 2009, IEEE Geoscience and Remote Sensing Letters.

[9]  K. Madhava Krishna,et al.  Semantic Motion Segmentation Using Dense CRF Formulation , 2014, ICVGIP.

[10]  Silvio Savarese,et al.  3D Scene Understanding by Voxel-CRF , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[12]  Carsten Rother,et al.  Dense Semantic Image Segmentation with Objects and Attributes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[15]  Alexandre Boulch,et al.  Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[16]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[17]  Touradj Ebrahimi,et al.  Privacy in mini-drone based video surveillance , 2015, ICIP.

[18]  Xinwu Li,et al.  PolSAR Image Semantic Segmentation Based on Deep Transfer Learning—Realizing Smooth Classification With Small Training Sets , 2019, IEEE Geoscience and Remote Sensing Letters.

[19]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Jianfei Cai,et al.  Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation , 2015, J. Vis. Commun. Image Represent..