Mining Mid-Level Visual Elements for Object Detection in High-Resolution Remote Sensing Images

The goal of mining middle-level visual elements is to discover a set of image patches that are representative of and discriminative for a target category. The commonly used mid-level feature representations such as bag-of-visual-words (BOW) models or part-based models in high-resolution remote sensing (HRS) images, seldom consider the discriminability of visual words or parts in object detection. To address this problem, we propose a novel and effective HRS image object detection method based on mid-level visual element representations. First, we employ an iterative procedure that alternates between retraining discriminative classifiers and mining for additional patch instances to discover the discriminative patches, i.e., discriminative mid-level visual elements. Then, a novel mid-level feature representation for an image is constructed based on these visual elements to achieve object detection in HRS images. The experiments on the two HRS image datasets demonstrated the effectiveness of the proposed method compared with several state-of-the-art BOW-based and part-based models.

[1]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[2]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Yu Li,et al.  Automatic Target Detection in High-Resolution Remote Sensing Images Using Spatial Sparse Coding Bag-of-Words Model , 2012, IEEE Geoscience and Remote Sensing Letters.

[5]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[6]  Jonghyun Choi,et al.  Mining Discriminative Triplets of Patches for Fine-Grained Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Abhinav Gupta,et al.  Mid-level Elements for Object Detection , 2015, ArXiv.

[8]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[9]  C. V. Jawahar,et al.  Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Junwei Han,et al.  Multi-class geospatial object detection and geographic image classification based on collection of part detectors , 2014 .

[12]  Derek Hoiem,et al.  Learning Collections of Part Models for Object Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Deren Li,et al.  Object Classification of Aerial Images With Bag-of-Visual Words , 2010, IEEE Geoscience and Remote Sensing Letters.

[14]  Jean Ponce,et al.  Learning Discriminative Part Detectors for Image Classification and Cosegmentation , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Alexei A. Efros,et al.  Mid-level Visual Element Discovery as Discriminative Mode Seeking , 2013, NIPS.

[17]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[18]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[19]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.