Vehicle Instance Segmentation From Aerial Image and Video Using a Multitask Learning Residual Fully Convolutional Network

Object detection and semantic segmentation are two main themes in object retrieval from high-resolution remote sensing images, which have recently achieved remarkable performance by surfing the wave of deep learning and, more notably, convolutional neural networks. In this paper, we are interested in a novel, more challenging problem of vehicle instance segmentation, which entails identifying, at a pixel level, where the vehicles appear as well as associating each pixel with a physical instance of a vehicle. In contrast, vehicle detection and semantic segmentation each only concern one of the two. We propose to tackle this problem with a semantic boundary-aware multitask learning network. More specifically, we utilize the philosophy of residual learning to construct a fully convolutional network that is capable of harnessing multilevel contextual feature representations learned from different residual blocks. We theoretically analyze and discuss why residual networks can produce better probability maps for pixelwise segmentation tasks. Then, based on this network architecture, we propose a unified multitask learning network that can simultaneously learn two complementary tasks, namely, segmenting vehicle regions and detecting semantic boundaries. The latter subproblem is helpful for differentiating “touching” vehicles that are usually not correctly separated into instances. Currently, data sets with a pixelwise annotation for vehicle extraction are the ISPRS data set and the IEEE GRSS DFC2015 data set over Zeebrugge, which specializes in a semantic segmentation. Therefore, we built a new, more challenging data set for vehicle instance segmentation, called the Busy Parking Lot Unmanned Aerial Vehicle Video data set, and we make our data set available at http://www.sipeo.bgu.tum.de/downloads so that it can be used to benchmark future vehicle instance segmentation algorithms.

[1]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Bertrand Le Saux,et al.  Segment-before-Detect: Vehicle Detection and Classification through Semantic Segmentation of Aerial Images , 2017, Remote. Sens..

[3]  Xiao Xiang Zhu,et al.  RiFCN: Recurrent Network in Fully Convolutional Network for Semantic Segmentation of High Resolution Remote Sensing Images , 2018, ArXiv.

[4]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[5]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[6]  Naif Alajlan,et al.  Deep Learning Approach for Car Detection in UAV Imagery , 2017, Remote. Sens..

[7]  Xiao Xiang Zhu,et al.  Deep Recurrent Neural Networks for Hyperspectral Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Xiao Xiang Zhu,et al.  Learning Spectral-Spatial-Temporal Features via a Recurrent Convolutional Neural Network for Change Detection in Multispectral Imagery , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Bo Du,et al.  A post-classification change detection method based on iterative slow feature analysis and Bayesian soft fusion , 2017, Remote Sensing of Environment.

[10]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Bertrand Le Saux,et al.  Fusion of heterogeneous data in convolutional networks for urban semantic labeling , 2017, 2017 Joint Urban Remote Sensing Event (JURSE).

[13]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[14]  Michael Kampffmeyer,et al.  Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Timothy Dozat,et al.  Incorporating Nesterov Momentum into Adam , 2016 .

[18]  Lichao Mou,et al.  Learning a Transferable Change Rule from a Recurrent Neural Network for Land Cover Change Detection , 2016, Remote. Sens..

[19]  Michele Volpi,et al.  Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Shiming Xiang,et al.  Vehicle Detection in Satellite Images by Hybrid Deep Convolutional Neural Networks , 2014, IEEE Geoscience and Remote Sensing Letters.

[21]  Jie Liu,et al.  Car detection from high-resolution aerial imagery using multiple features , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[22]  Jon Atli Benediktsson,et al.  A Novel Automatic Change Detection Method for Urban High-Resolution Remotely Sensed Imagery Based on Multiindex Scene Representation , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[23]  Anton van den Hengel,et al.  High-performance Semantic Segmentation Using Very Deep Fully Convolutional Networks , 2016, ArXiv.

[24]  Qingshan Liu,et al.  Cascaded Recurrent Neural Networks for Hyperspectral Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[25]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Gellért Máttyus,et al.  Fast Multiclass Vehicle Detection on Aerial Images , 2015, IEEE Geoscience and Remote Sensing Letters.

[28]  Xiao Xiang Zhu,et al.  IM2HEIGHT: Height Estimation from Single Monocular Imagery via Fully Residual Convolutional-Deconvolutional Network , 2018, ArXiv.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[32]  Michael C. Kampffmeyer Detection of small objects , land cover mapping and modelling of uncertainty in urban remote sensing images using deep convolutional neural networks , 2016 .

[33]  Markus Gerke,et al.  The ISPRS benchmark on urban object classification and 3D building reconstruction , 2012 .

[34]  Nikos Paragios,et al.  Multitemporal Very High Resolution From Space: Outcome of the 2016 IEEE GRSS Data Fusion Contest , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[35]  Nikos Komodakis,et al.  Graph-Based Registration, Change Detection, and Classification in Very High Resolution Multitemporal Remote Sensing Data , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[36]  Xiao Xiang Zhu,et al.  Spatiotemporal scene interpretation of space videos via deep neural network and tracklet analysis , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[37]  Charless C. Fowlkes,et al.  Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation , 2016, ECCV.

[38]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  Xiao Xiang Zhu,et al.  Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources , 2017, IEEE Geoscience and Remote Sensing Magazine.

[41]  Xiao Xiang Zhu,et al.  Unsupervised Spectral–Spatial Feature Learning via Deep Residual Conv–Deconv Network for Hyperspectral Image Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[42]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[43]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[44]  Carsten Rother,et al.  InstanceCut: From Edges to Instances with MultiCut , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Farid Melgani,et al.  Automatic Car Counting Method for Unmanned Aerial Vehicle Images , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[46]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Farid Melgani,et al.  Detecting Cars in UAV Images With a Catalog-Based Approach , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[48]  Konstantinos Karantzalos,et al.  Vehicle detection and traffic density monitoring from very high resolution satellite video data , 2015, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[49]  Uwe Stilla,et al.  Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection , 2016, ISPRS Journal of Photogrammetry and Remote Sensing.

[50]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.