AKULA -- Adaptive Cluster Aggregation for Visual Search

Key point features are very effective tools in image matching and key point feature aggregation is an effective scheme for creating a compact representation of the images for visual search. This solution not only achieves compression, but also offers the benefits of better accuracy in matching and indexing efficiency. Research is active in this area and recent results on Fisher Vector based aggregation have shown to be very effective in a number of application scenarios. In this paper, we present a new direct aggregation scheme that is adaptive to the descriptor distributions from individual images and does not enforce a single generative model such as GMM in the Fisher Vector type aggregation. Moreover, it achieves better compression as well as image matching accuracy. Simulation results with the image identification data set from MPEG Compact Descriptor for Visual Search (CDVS) effort demonstrate the effectiveness of this approach.

[1]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[3]  L. Goddard Information Theory , 1962, Nature.

[4]  Huizhong Chen,et al.  Residual Enhanced Visual Vectors for on-device image matching , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[5]  Wen Gao,et al.  PQ-WGLOH: A bit-rate scalable local feature descriptor , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Patrick Pérez,et al.  Revisiting the VLAD image representation , 2013, ACM Multimedia.

[7]  Yuriy Reznik,et al.  On MPEG work towards a standard for visual search , 2011, Optical Engineering + Applications.

[8]  Aggelos K. Katsaggelos,et al.  Laplacian embedding and key points topology verification for large scale mobile visual identification , 2013, Signal Process. Image Commun..

[9]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[11]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[12]  Zhu Li,et al.  Grassmann Hashing for approximate nearest neighbor search in high dimensional space , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[13]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[14]  Ling-Yu Duan,et al.  Compact descriptors for mobile visual search and MPEG CDVS standardization , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[15]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[16]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[17]  Nenghai Yu,et al.  Complementary hashing for approximate nearest neighbor search , 2011, 2011 International Conference on Computer Vision.

[18]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.