On the interoperability of local descriptors compression

There are a number of component technologies that are useful for visual search, including format of visual descriptors, descriptor extraction process, as well as indexing, and matching algorithms. As a minimum, the format of descriptors as well as parts of their extraction process should be defined to ensure interoperability. In this paper, we study the problem of interoperability among compressed local descriptors at different bit-rates; that is, allowing effective and efficient comparison of compact descriptors, which is fundamentally important to mobile visual search applications. We propose to combine feature transform and multi-stage vector quantization to implement the interoperability of compact local descriptors. First, an orthogonal transform (e.g. Principle component analysis, PCA) is employed to eliminate the correlation between local feature dimensions, which improves the performance of compressed domain descriptor matching with the well-aligned distance computing of sorted important features in transform space. Second, a multi-stage vector quantization (MSVQ) is applied to generate compact codes for local descriptors. At light quantization tables, MSVQ takes advantage of the transform domain features to properly allocate different budgets to each group of transformed feature dimensions, respectively. The interoperability between compressed descriptors at different bit rates can be achieved by the descriptors' fast matching in the orthogonal feature space. In other words, descriptor decoding into the original feature space (SIFT space) is unnecessary, as the distance can be calculated by pre-computed lookup tables. In particular, such efficient matching in transform domain is significant for large-scale visual search. Over a set of benchmark datasets, we have reported superior performance over state-of-the-arts.

[1]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[4]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Bernd Girod,et al.  CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Bernd Girod,et al.  Transform coding of image feature descriptors , 2009, Electronic Imaging.

[8]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[9]  Wen Gao,et al.  Compact Descriptors for Visual Search , 2014, IEEE MultiMedia.

[10]  Shih-Fu Chang Compressed-Domain Content-Based Image and Video Retrieval , 1996 .

[11]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Wen Gao,et al.  Location Discriminative Vocabulary Coding for Mobile Landmark Search , 2012, International Journal of Computer Vision.

[13]  Bernd Girod,et al.  Tree Histogram Coding for Mobile Image Matching , 2009, 2009 Data Compression Conference.

[14]  Gang Hua,et al.  Discriminant Embedding for Local Image Descriptors , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Qi Tian,et al.  Social Visual Image Ranking for Web Image Search , 2013, MMM.

[16]  Cordelia Schmid,et al.  Packing bag-of-features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .