Combining Fisher Vector and Convolutional Neural Networks for Image Retrieval

Fisher Vector (FV) and deep Convolutional Neural Network (CNN) are two popular approaches for extracting effective image representations. FV aggregates local information (e.g., SIFT) and have been state-of-the-art before the recent success of deep learning approaches. Recently, combination of FV and CNN has been investigated. However, only the aggregation of SIFT has been tested. In this work, we propose combining CNN and FV built upon binary local features, called BMM-FV. The results show that BMM-FV and CNN improve the latter retrieval performance with less computational effort with respect to the use of the traditional FV which relies on non-binary features.

[1]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[2]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[3]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[8]  Jie Lin,et al.  A practical guide to CNNs and Fisher Vectors for image instance retrieval , 2015, Signal Process..

[9]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.