Efficient video search using image queries

We study the challenges of image-based retrieval when the database consists of videos. This variation of visual search is important for a broad range of applications that require indexing video databases based on their visual contents. We present new solutions to reduce storage requirements, while at the same time improving video search quality. The video database is preprocessed to find different appearances of the same visual elements, and build robust descriptors. Compression algorithms are developed to reduce system's storage requirements. We introduce a dataset of CNN broadcasts and queries that include photos taken with mobile phones and images of objects. Our experiments include pairwise matching and retrieval scenarios. We demonstrate one order of magnitude storage reduction and search quality improvements of up to 12% in mean average precision, compared to a baseline system that does not make use of our techniques.

[1]  A. Smeaton,et al.  TRECVID 2013 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics | NIST , 2011 .

[2]  Xing Xie,et al.  Vocabulary Hierarchy Optimization and Transfer for Scalable Image Search , 2011, IEEE MultiMedia.

[3]  Andrew Zisserman,et al.  Object Level Grouping for Video Shots , 2004, International Journal of Computer Vision.

[4]  Andrei Bursuc,et al.  ARTEMIS.Ubimedia at TRECVID 2012: Instance Search Task , 2012, TRECVID.

[5]  Bernd Girod,et al.  Location coding for mobile image retrieval , 2009, MobiMedia.

[6]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Andrew Zisserman,et al.  Video Google: Efficient Visual Search of Videos , 2006, Toward Category-Level Object Recognition.

[8]  Bernd Girod,et al.  Interframe Coding of Canonical Patches for Mobile Augmented Reality , 2012, 2012 IEEE International Symposium on Multimedia.

[9]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Pascal Fua,et al.  LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Christian Schulze,et al.  Retrieving Objects, People and Places from a video Collection: TRECVID'12 Instance Search Task , 2012, TRECVID.

[13]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[14]  Ian H. Witten,et al.  Arithmetic coding revisited , 1998, TOIS.

[15]  Bernd Girod,et al.  Compressed Histogram of Gradients: A Low-Bitrate Descriptor , 2011, International Journal of Computer Vision.

[16]  Bernd Girod,et al.  Dynamic selection of a feature-rich query frame for mobile video retrieval , 2010, 2010 IEEE International Conference on Image Processing.

[17]  Bernd Girod,et al.  Inverted Index Compression for Scalable Image Matching , 2010, 2010 Data Compression Conference.

[18]  Wen Gao,et al.  Towards low bit rate mobile visual search with multiple-channel coding , 2011, ACM Multimedia.

[19]  Bernd Girod,et al.  Outdoors augmented reality on mobile phone using loxel-based visual feature organization , 2008, MIR '08.

[20]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.