Accurate off-line query expansion for large-scale mobile visual search

Mobile visual search is a new class of applications that use images taken by camera phone to initiate search queries. It is a very challenging task mainly because of image affine transformations caused by viewpoints changes, and motion blur due to hand tremble. These problems are unavoidable in mobile visual search and often result in low recall. Query expansion is an effective strategy for recall improvement, but existing methods are highly memory and time consuming, and often involve lots of redundant features. Integrating robust local patch mining and geometric parameter coding, this paper proposes an accurate offline query expansion method for large-scale mobile visual search. Concretely, a novel criterion is presented for robust patch evaluation and mining. Then multiple representative features are extracted from these selected local patches to deal with viewpoint changes. Moreover, the geometric parameter of each representative viewpoint is also recorded, to support fast and accurate feature matching. Experimental results on several well-known datasets and a large image set (1M) have demonstrated the effectiveness and efficiency of our method, especially its high robustness to viewpoint changes. The proposed approach can also be well generalized to other multimedia content analysis tasks.

[1]  Tat-Seng Chua,et al.  Image Annotation by Graph-Based Inference With Integrated Multiple/Single Instance Representations , 2010, IEEE Transactions on Multimedia.

[2]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[3]  Shin'ichi Satoh,et al.  Indexing local configurations of features for scalable content-based video copy detection , 2009, LS-MMRM '09.

[4]  Sunil Arya,et al.  Space-time tradeoffs for approximate nearest neighbor searching , 2009, JACM.

[5]  Wen Gao,et al.  PKUBench: A context rich mobile visual search benchmark , 2011, 2011 18th IEEE International Conference on Image Processing.

[6]  Wolfgang Heidrich,et al.  Cloth Motion Capture , 2003, Comput. Graph. Forum.

[7]  Yongdong Zhang,et al.  Visual stem mapping and Geometric Tense coding for Augmented Visual Vocabulary , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Xian-Sheng Hua,et al.  Towards a Relevant and Diverse Search of Social Images , 2010, IEEE Transactions on Multimedia.

[9]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[10]  Xing Tong. Liu Mobile product recognition for information retrieval. , 2012 .

[11]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[12]  William T. Freeman,et al.  Removing camera shake from a single photograph , 2006, ACM Trans. Graph..

[13]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[14]  Bingbing Ni,et al.  Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.

[15]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[17]  Yi-Hsuan Yang,et al.  Unsupervised auxiliary visual words discovery for large-scale image object retrieval , 2011, CVPR 2011.

[18]  Meng Wang,et al.  Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification , 2012, IEEE Transactions on Multimedia.

[19]  Yongdong Zhang,et al.  Affine Stable Characteristic based sample expansion for object detection , 2010, CIVR '10.

[20]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[21]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[22]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[23]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[25]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[26]  Shmuel Peleg,et al.  Two motion-blurred images are better than one , 2005, Pattern Recognit. Lett..

[27]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[29]  Winston H. Hsu,et al.  Query expansion for hash-based image object retrieval , 2009, ACM Multimedia.

[30]  Yannis Avrithis,et al.  Feature map hashing: sub-linear indexing of appearance and global geometry , 2010, ACM Multimedia.

[31]  Bernd Girod,et al.  Outdoors augmented reality on mobile phone using loxel-based visual feature organization , 2008, MIR '08.

[32]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[33]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[34]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.