Segment-before-Detect: Vehicle Detection and Classification through Semantic Segmentation of Aerial Images

Like computer vision before, remote sensing has been radically changed by the introduction of deep learning and, more notably, Convolution Neural Networks. Land cover classification, object detection and scene understanding in aerial images rely more and more on deep networks to achieve new state-of-the-art results. Recent architectures such as Fully Convolutional Networks can even produce pixel level annotations for semantic mapping. In this work, we present a deep-learning based segment-before-detect method for segmentation and subsequent detection and classification of several varieties of wheeled vehicles in high resolution remote sensing images. This allows us to investigate object detection and classification on a complex dataset made up of visually similar classes, and to demonstrate the relevance of such a subclass modeling approach. Especially, we want to show that deep learning is also suitable for object-oriented analysis of Earth Observation data as effective object detection can be obtained as a byproduct of accurate semantic segmentation. First, we train a deep fully convolutional network on the ISPRS Potsdam and the NZAM/ONERA Christchurch datasets and show how the learnt semantic maps can be used to extract precise segmentation of vehicles. Then, we show that those maps are accurate enough to perform vehicle detection by simple connected component extraction. This allows us to study the repartition of vehicles in the city. Finally, we train a Convolutional Neural Network to perform vehicle classification on the VEDAI dataset, and transfer its knowledge to classify the individual vehicle instances that we detected.

[1]  Bertrand Le Saux,et al.  On the usability of deep networks for object-based image analysis , 2016, ArXiv.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Richard S. Zemel,et al.  End-to-End Instance Segmentation and Counting with Recurrent Attention , 2016, ArXiv.

[4]  Frédéric Jurie,et al.  Vehicle detection in aerial imagery : A small target detection benchmark , 2016, J. Vis. Commun. Image Represent..

[5]  David M. Booth,et al.  Pose-invariant vehicle identification in aerial electro-optical imagery , 2015, Machine Vision and Applications.

[6]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[8]  Jamie Sherrah,et al.  Effective semantic pixel labelling with convolutional networks and Conditional Random Fields , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Bertrand Le Saux,et al.  How useful is region-based classification of remote sensing images in a deep learning framework? , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[10]  P. Gong,et al.  Object-based Detection and Classification of Vehicles from High-resolution Aerial Photography , 2009 .

[11]  Bertrand Le Saux,et al.  Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks , 2016, ACCV.

[12]  Jefersson Alex dos Santos,et al.  Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Uwe Stilla,et al.  SEMANTIC SEGMENTATION OF AERIAL IMAGES WITH AN ENSEMBLE OF CNNS , 2016 .

[15]  Michael Cramer,et al.  The DGPF-Test on Digital Airborne Camera Evaluation - Over- view and Test Design , 2010 .

[16]  Jefersson Alex dos Santos,et al.  Towards better exploiting convolutional neural networks for remote sensing scene classification , 2016, Pattern Recognit..

[17]  Pierre Alliez,et al.  Fully convolutional neural networks for remote sensing image classification , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[18]  Shiming Xiang,et al.  Vehicle Detection in Satellite Images by Hybrid Deep Convolutional Neural Networks , 2014, IEEE Geoscience and Remote Sensing Letters.

[19]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Uwe Stilla,et al.  Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection , 2016, ISPRS Journal of Photogrammetry and Remote Sensing.

[21]  Lorenzo Bruzzone,et al.  Domain Adaptation for the Classification of Remote Sensing Data: An Overview of Recent Advances , 2016, IEEE Geoscience and Remote Sensing Magazine.

[22]  Terrence Fong,et al.  Vehicle detection from aerial imagery , 2011, 2011 IEEE International Conference on Robotics and Automation.

[23]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Julien Michel,et al.  Local feature based supervised object detection: Sampling, learning and detection strategies , 2011, 2011 IEEE International Geoscience and Remote Sensing Symposium.

[26]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[29]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[31]  Jamie Sherrah,et al.  Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial Imagery , 2016, ArXiv.

[32]  Markus Gerke,et al.  The ISPRS benchmark on urban object classification and 3D building reconstruction , 2012 .

[33]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jamie Sherrah,et al.  Aerial Car Detection and Urban Understanding , 2015, 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[35]  Alexandre Boulch,et al.  Processing of Extremely High-Resolution LiDAR and RGB Data: Outcome of the 2015 IEEE GRSS Data Fusion Contest–Part A: 2-D Contest , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[36]  Serge Beucher,et al.  The Morphological Approach to Segmentation: The Watershed Transformation , 2018, Mathematical Morphology in Image Processing.

[37]  Zhenfeng Shao,et al.  Deep feature representations for high-resolution remote sensing scene classification , 2016, 2016 4th International Workshop on Earth Observation and Remote Sensing Applications (EORSA).

[38]  Alexandre Boulch,et al.  Benchmarking classification of earth-observation data: From learning explicit features to convolutional networks , 2015, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[39]  Marin Ferecatu,et al.  Urban structure detection with deformable part-based models , 2013, 2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS.

[40]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[41]  Fatos T. Yarman-Vural,et al.  Representation Learning for Contextual Object and Region Detection in Remote Sensing , 2014, 2014 22nd International Conference on Pattern Recognition.

[42]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[43]  Line Eikvil,et al.  Classification-based vehicle detection in high-resolution satellite images , 2009 .