Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images

Object detection in very high resolution optical remote sensing images is a fundamental problem faced for remote sensing image analysis. Due to the advances of powerful feature representations, machine-learning-based object detection is receiving increasing attention. Although numerous feature representations exist, most of them are handcrafted or shallow-learning-based features. As the object detection task becomes more challenging, their description capability becomes limited or even impoverished. More recently, deep learning algorithms, especially convolutional neural networks (CNNs), have shown their much stronger feature representation power in computer vision. Despite the progress made in nature scene images, it is problematic to directly use the CNN feature for object detection in optical remote sensing images because it is difficult to effectively deal with the problem of object rotation variations. To address this problem, this paper proposes a novel and effective approach to learn a rotation-invariant CNN (RICNN) model for advancing the performance of object detection, which is achieved by introducing and learning a new rotation-invariant layer on the basis of the existing CNN architectures. However, different from the training of traditional CNN models that only optimizes the multinomial logistic regression objective, our RICNN model is trained by optimizing a new objective function via imposing a regularization constraint, which explicitly enforces the feature representations of the training samples before and after rotating to be mapped close to each other, hence achieving rotation invariance. To facilitate training, we first train the rotation-invariant layer and then domain-specifically fine-tune the whole RICNN network to further boost the performance. Comprehensive evaluations on a publicly available ten-class object detection data set demonstrate the effectiveness of the proposed method.

[1]  Junwei Han,et al.  Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding , 2014 .

[2]  Lei Guo,et al.  Learning coarse-to-fine sparselets for efficient object detection and scene classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Lei Guo,et al.  Weakly Supervised Learning for Target Detection in Remote Sensing Images , 2015, IEEE Geoscience and Remote Sensing Letters.

[4]  Line Eikvil,et al.  Classification-based vehicle detection in high-resolution satellite images , 2009 .

[5]  Lei Guo,et al.  A coarse-to-fine model for airport detection from remote sensing images using target-oriented visual saliency and CRF , 2015, Neurocomputing.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Gui-Song Xia,et al.  Accurate Annotation of Remote Sensing Images via Active Spectral Clustering with Little Expert Knowledge , 2015, Remote. Sens..

[8]  Lining Gao,et al.  A Visual Search Inspired Computational Model for Ship Detection in Optical Satellite Images , 2012, IEEE Geoscience and Remote Sensing Letters.

[9]  Bo Du,et al.  A Nonlinear Sparse Representation-Based Binary Hypothesis Model for Hyperspectral Target Detection , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[10]  Horst Bischof,et al.  On-line boosting-based car detection from aerial images , 2008 .

[11]  Bo Du,et al.  A Sparse Representation-Based Binary Hypothesis Model for Target Detection in Hyperspectral Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[12]  Hui Zhou,et al.  A Novel Hierarchical Method of Ship Detection from Spaceborne Optical Image Based on Shape and Texture Features , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[13]  Uwe Stilla,et al.  Vehicle Detection in Very High Resolution Satellite Images of City Areas , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[14]  Ugur Halici,et al.  Texture-Based Airport Runway Detection , 2013, IEEE Geoscience and Remote Sensing Letters.

[15]  Liangpei Zhang,et al.  Sparse Transfer Manifold Embedding for Hyperspectral Target Detection , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[16]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Sukhendu Das,et al.  Use of Salient Features for the Design of a Multistage Framework to Extract Roads From High-Resolution Multispectral Satellite Images , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[18]  Liangpei Zhang,et al.  Non-Local Sparse Unmixing for Hyperspectral Remote Sensing Imagery , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[19]  Xuelong Li,et al.  Two-Stage Learning to Predict Human Eye Fixations via SDAEs , 2016, IEEE Transactions on Cybernetics.

[20]  Xueming Qian,et al.  Semantic Annotation of High-Resolution Satellite Images via Weakly Supervised Learning , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[21]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Julie Delon,et al.  Accurate Junction Detection and Characterization in Natural Images , 2013, International Journal of Computer Vision.

[23]  Junwei Han,et al.  A Survey on Object Detection in Optical Remote Sensing Images , 2016, ArXiv.

[24]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[26]  Lei Guo,et al.  Auto-encoder-based shared mid-level visual dictionary learning for scene classification using very high resolution remote sensing images , 2015, IET Comput. Vis..

[27]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[28]  Lei Guo,et al.  Effective and Efficient Midlevel Visual Elements-Oriented Land-Use Classification Using VHR Remote Sensing Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[29]  Xintao Hu,et al.  Weakly supervised target detection in remote sensing images based on transferred deep features and negative bootstrapping , 2016, Multidimens. Syst. Signal Process..

[30]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[31]  Naoto Yokoya,et al.  Object Detection Based on Sparse Representation and Hough Voting for Optical Remote Sensing Imagery , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[32]  Hong Sun,et al.  Unsupervised Feature Learning Via Spectral Clustering of Multidimensional Patches for Remotely Sensed Scene Classification , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[33]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Uwe Stilla,et al.  Airborne Vehicle Detection in Dense Urban Areas Using HoG Features and Disparity Maps , 2013, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[35]  Junwei Han,et al.  Multi-class geospatial object detection and geographic image classification based on collection of part detectors , 2014 .

[36]  Deren Li,et al.  Object Classification of Aerial Images With Bag-of-Visual Words , 2010, IEEE Geoscience and Remote Sensing Letters.

[37]  Yu Li,et al.  Automatic Target Detection in High-Resolution Remote Sensing Images Using Spatial Sparse Coding Bag-of-Words Model , 2012, IEEE Geoscience and Remote Sensing Letters.

[38]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[39]  Julie Delon,et al.  Shape-based Invariant Texture Indexing , 2010, International Journal of Computer Vision.

[40]  Xin Huang,et al.  Road centreline extraction from high‐resolution imagery based on multiscale structural features and support vector machines , 2009 .

[41]  Gui-Song Xia,et al.  Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery , 2015, Remote. Sens..

[42]  Junwei Han,et al.  Automatic landslide detection from remote-sensing imagery using a scene classification method based on BoVW and pLSA , 2013 .

[43]  Ping Zhong,et al.  A Multiple Conditional Random Fields Ensemble Model for Urban Area Detection in Remote Sensing Optical Images , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[44]  Feng Wu,et al.  Background Prior-Based Salient Object Detection via Deep Reconstruction Residual , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[45]  Chao Yao,et al.  Approximative Bayes optimality linear discriminant analysis for Chinese handwriting character recognition , 2016, Neurocomputing.

[46]  Junwei Han,et al.  Object detection in remote sensing imagery using a discriminatively trained mixture model , 2013 .

[47]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Trac D. Tran,et al.  Sparse Representation for Target Detection in Hyperspectral Imagery , 2011, IEEE Journal of Selected Topics in Signal Processing.

[49]  Lei Guo,et al.  Object Detection in Optical Remote Sensing Images Based on Weakly Supervised Learning and High-Level Feature Learning , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[50]  Antonio J. Plaza,et al.  One-Class Classification of Remote Sensing Images Using Kernel Sparse Representation , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[51]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.