Multiple VLAD Encoding of CNNs for Image Classification

Despite the effectiveness of convolutional neural networks (CNNs), especially for image classification tasks, the effect of convolution features on learned representations is still limited, mainly focusing on an images salient object but ignoring the variation information from clutter and local objects. The authors propose a multiple vector of locally aggregated descriptors (VLAD) encoding method with CNN features for image classification. To improve the VLAD coding methods performance, they explore the multiplicity of VLAD encoding with the extension of three encoding algorithms. Moreover, they equip the spatial pyramid patch (SPM) on VLAD encoding to add spatial information to CNN features. The addition of SPM, in particular, allows their proposed framework to yield better performance compared to the traditional method.

[1]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[4]  Larry S. Davis,et al.  Exploiting local features from deep networks for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Yizhou Yu,et al.  Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Tatsuya Harada,et al.  Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning , 2016, ArXiv.

[8]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Takayuki Okatani,et al.  Design of Kernels in Convolutional Neural Networks for Image Classification , 2016, ECCV.

[10]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[11]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[12]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Joachim M. Buhmann,et al.  TI-POOLING: Transformation-Invariant Pooling for Feature Learning in Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Kin Hong Wong,et al.  CSIFT based locality-constrained linear coding for image classification , 2014, Pattern Analysis and Applications.

[18]  Andrew Zisserman,et al.  Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.