Image Retrieval Framework Based on Dual Representation Descriptor

Descriptor aggregation techniques such as the Fisher vector and vector of locally aggregated descriptors (VLAD) are used in most image retrieval frameworks. It takes some time to extract local descriptors, and the geometric verification requires storage if a real-valued descriptor such as SIFT is used. Moreover, if we apply binary descriptors to such a framework, the performance of image retrieval is not better than if we use a real-valued descriptor. Our approach tackles these issues by using a dual representation descriptor that has advantages of being both a real-valued and a binary descriptor. The real value of the dual representation descriptor is aggregated into a VLAD in order to achieve high accuracy in the image retrieval, and the binary one is used to find correspondences in the geometric verification stage in order to reduce the amount of storage needed. We implemented a dual representation descriptor extracted in semi-real time by using the CARD descriptor. We evaluated the accuracy of our image retrieval framework including the geometric verification on three datasets (holidays, ukbench and Stanford mobile visual search). The results indicate that our framework is as accurate as the framework that uses SIFT. In addition, the experiments show that the image retrieval speed and storage requirements of our framework are as efficient as those of a framework that uses ORB. key words: image retrieval, local descriptor

[1]  Kiyoharu Aizawa,et al.  PQTable: Fast Exact Asymmetric Distance Neighbor Search for Product Quantization Using Hash Tables , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[5]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[7]  Miguel Á. Carreira-Perpiñán,et al.  Practical Identifiability of Finite Mixtures of Multivariate Bernoulli Distributions , 2000, Neural Computation.

[8]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[9]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[10]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[12]  Martha Larson,et al.  Pairwise geometric matching for large-scale object retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Dorian Gálvez-López,et al.  Real-time loop detection with bags of binary words , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Bernd Girod,et al.  Memory-Efficient Image Databases for Mobile Visual Search , 2014, IEEE MultiMedia.

[17]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Huizhong Chen,et al.  The stanford mobile visual search data set , 2011, MMSys.

[19]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Yuichi Yoshida,et al.  CARD: Compact And Real-time Descriptors , 2011, 2011 International Conference on Computer Vision.

[22]  IV. F ISHER,et al.  Image Retrieval with Fisher Vectors of Binary Features , 2013 .

[23]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[24]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[25]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.