Large-Scale Query-by-Image Video Retrieval Using Bloom Filters

We consider the problem of using image queries to retrieve videos from a database. Our focus is on large-scale applications, where it is infeasible to index each database video frame independently. Our main contribution is a framework based on Bloom filters, which can be used to index long video segments, enabling efficient image-to-video comparisons. Using this framework, we investigate several retrieval architectures, by considering different types of aggregation and different functions to encode visual information -- these play a crucial role in achieving high performance. Extensive experiments show that the proposed technique improves mean average precision by 24% on a public dataset, while being 4X faster, compared to the previous state-of-the-art.

[1]  Bernd Girod,et al.  Stanford I2V: a news video dataset for query-by-image experiments , 2015, MMSys.

[2]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[3]  Shin'ichi Satoh,et al.  Large vocabulary quantization for searching instances from videos , 2012, ICMR '12.

[4]  Zhang Wen,et al.  PKU_ICST at TRECVID 2018: Instance Search Task , 2013, TRECVID.

[5]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Laurent Amsaleg,et al.  Locality sensitive hashing: A comparison of hash function types and querying mechanisms , 2010, Pattern Recognit. Lett..

[7]  Alberto Del Bimbo,et al.  A data-driven approach for tag refinement and localization in web videos , 2015, Comput. Vis. Image Underst..

[8]  Monika Henzinger,et al.  Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.

[9]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[10]  Cees Snoek,et al.  Objects2action: Classifying and Localizing Actions without Any Video Example , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Hervé Glotin,et al.  IRIM at TRECVID 2014: Semantic Indexing and Instance Search , 2014, TRECVID.

[12]  Hervé Jégou,et al.  Beyond “project and sign” for cosine estimation with binary codes , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Shin'ichi Satoh,et al.  Multi-image aggregation for better visual object retrieval , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Bernd Girod,et al.  Temporal aggregation for large-scale query-by-image video retrieval , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[15]  Mubarak Shah,et al.  Scene detection in Hollywood movies and TV shows , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[16]  Yannis Avrithis,et al.  Early burst detection for memory-efficient image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[19]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[20]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[21]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[22]  Dong Liu,et al.  Event-Driven Semantic Concept Discovery by Exploiting Weakly Tagged Internet Images , 2014, ICMR.

[23]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[24]  Wei Liu,et al.  Learning to Hash for Indexing Big Data—A Survey , 2015, Proceedings of the IEEE.

[25]  Wen Gao,et al.  Component hashing of variable-length binary aggregated descriptors for fast image search , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[26]  Ba Tu Truong,et al.  Scene extraction in motion pictures , 2003, IEEE Trans. Circuits Syst. Video Technol..

[27]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[28]  Ramakant Nevatia,et al.  Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images , 2015, ACM Multimedia.

[29]  Shin'ichi Satoh,et al.  Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[31]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[32]  Michael Mitzenmacher,et al.  Distance-Sensitive Bloom Filters , 2006, ALENEX.

[33]  Dong Xu,et al.  Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Arnold W. M. Smeulders,et al.  Locality in Generic Instance Search from One Example , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Gang Wang,et al.  Object Instance Search in Videos via Spatio-Temporal Trajectory Discovery , 2016, IEEE Transactions on Multimedia.

[37]  Duy-Dinh Le,et al.  National Institute of Informatics, Japan at TRECVID 2008 , 2008, TRECVID.

[38]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[39]  Yannis Avrithis,et al.  Image Search with Selective Match Kernels: Aggregation Across Single and Multiple Images , 2016, International Journal of Computer Vision.

[40]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..