Learning region-wise deep feature representation for image analysis

Effective feature representation plays an important role in image analysis tasks. In recent years, deep features, instead of hand-crafted features, have become the mainstream of the representation in image analysis tasks. However, the existing deep learning methods always extract feature representations from the whole image directly. Such strategies concentrate on extracting global features, and tend to fail in capturing local geometric invariance and introduce noise information from regions of not interest. In this paper, we propose a novel region-wise deep feature extraction framework for promoting the local geometric invariance and reducing noise information. In our algorithm, object proposal is adopted to generate a set of foreground object bounding boxes, from which the pre-trained convolutional neural network model is adopted to extract region-wise deep features. Then, an improved vector of locally aggregated descriptors strategy with weighted multi-neighbor assignment is proposed to encode the local region-wise feature representations. The final feature representation is not restricted to the classification task, and can also be further quantized to hash codes for large-scale image retrieval. Extensive experiments conducted on publicly available datasets demonstrate the promising performance of our work against the state-of-the-art methods in both image retrieval and classification tasks.

[1]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Victor S. Lempitsky,et al.  Aggregating Deep Convolutional Features for Image Retrieval , 2015, ArXiv.

[3]  Atsuto Maki,et al.  Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR.

[4]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[5]  Matthew B. Blaschko,et al.  Learning a category independent object detection cascade , 2011, 2011 International Conference on Computer Vision.

[6]  Nuno Vasconcelos,et al.  Scene classification with semantic Fisher vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[8]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[9]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Andrew Zisserman,et al.  Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.

[11]  C. Lawrence Zitnick,et al.  Fast Edge Detection Using Structured Forests , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Jordi Pont-Tuset,et al.  Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Matthieu Cord,et al.  WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Wei Liu,et al.  Supervised Discrete Hashing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Hanjiang Lai,et al.  Supervised Hashing for Image Retrieval via Image Representation Learning , 2014, AAAI.

[19]  Kai-Kuang Ma,et al.  HOG-assisted deep feature learning for pedestrian gender recognition , 2017, J. Frankl. Inst..

[20]  Jiwen Lu,et al.  Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[22]  Jianbo Liu,et al.  An improved Bag-of-Words framework for remote sensing image retrieval in large-scale image databases , 2015, Int. J. Digit. Earth.

[23]  Yang Liu,et al.  A robust local sparse coding method for image classification with Histogram Intersection Kernel , 2016, Neurocomputing.

[24]  Yo-Ping Huang,et al.  Image Retrieval Based on Dominant Texture Features , 2006, 2006 IEEE International Symposium on Industrial Electronics.

[25]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[26]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[28]  Chu-Song Chen,et al.  Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Hanjiang Lai,et al.  Simultaneous feature learning and hash coding with deep neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Philip S. Yu,et al.  HashNet: Deep Learning to Hash by Continuation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Xiaoyu Zhang,et al.  ListNet-based object proposals ranking , 2017, Neurocomputing.

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  Lei Wang,et al.  Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors , 2014, NIPS.

[35]  Limin Wang,et al.  Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics , 2014, ECCV.

[36]  Xiaoyu Zhang,et al.  Update vs. upgrade: Modeling with indeterminate multi-class active learning , 2015, Neurocomputing.

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Ngai-Man Cheung,et al.  Selective Deep Convolutional Features for Image Retrieval , 2017, ACM Multimedia.

[39]  Shengcai Liao,et al.  Multi-label convolutional neural network based pedestrian attribute classification , 2017, Image Vis. Comput..

[40]  Nuno Vasconcelos,et al.  VLAD3: Encoding Dynamics of Deep Features for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Larry S. Davis,et al.  Exploiting local features from deep networks for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[42]  Cécile Barat,et al.  String representations and distances in deep Convolutional Neural Networks for image classification , 2016, Pattern Recognit..

[43]  Changsheng Li,et al.  Sparse representation for robust abnormality detection in crowded scenes , 2014, Pattern Recognit..

[44]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[45]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[46]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.