Searching for images by video

Image retrieval based on the query-by-example (QBE) principle is still not reliable enough, largely because of the likely variations in the capture conditions (e.g. light, blur, scale, occlusion) and viewpoint between the query image and the images in the collection. In this paper, we propose a framework in which this problem is explicitly addressed to improve the reliability of QBE-based image retrieval. We aim at the use scenario involving the user capturing the query object by his/her mobile device and requesting information augmenting the query from the database. Reliability improvement is achieved by allowing the user to submit not a single image but a short video clip as a query. Since a video clip may combine object or scene appearances captured from different viewpoints and under different conditions, the rich information contained therein can be exploited to discover the proper query representation and to improve the relevance of the retrieved results. The experimental results show that video-based image retrieval (VBIR) is significantly more reliable than the retrieval using a single image as query. Furthermore, to make the proposed framework deployable in a practical mobile image retrieval system, where realtime query response is required, we also propose the priority queue-based feature description scheme and cache-based bi-quantization algorithm for an efficient parallel implementation of the VBIR concept.

[1]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[3]  Chong-Wah Ngo,et al.  Practical elimination of near-duplicates from web video search , 2007, ACM Multimedia.

[4]  Andrew Zisserman,et al.  Object Level Grouping for Video Shots , 2004, International Journal of Computer Vision.

[5]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[6]  Thomas Wiegand,et al.  SIFT Implementation and Optimization for General-Purpose GPU , 2007 .

[7]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[9]  Ameesh Makadia,et al.  Feature Tracking for Wide-Baseline Image Retrieval , 2010, ECCV.

[10]  Xian-Sheng Hua,et al.  Contextual image retrieval model , 2010, CIVR '10.

[11]  Dieter Schmalstieg,et al.  Multiple target detection and tracking with guaranteed framerates on mobile phones , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[12]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Panu Turcot,et al.  Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[15]  Xian-Sheng Hua,et al.  Large-scale robust visual codebook construction , 2010, ACM Multimedia.

[16]  Sebastian Thrun,et al.  Unsupervised learning of invariant features using video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[18]  Gary R. Bradski,et al.  Learning OpenCV - computer vision with the OpenCV library: software that sees , 2008 .

[19]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[20]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.