A Framework for Effective Known-item Search in Video

Searching for one particular scene in a large video collection (known-item search) represents a challenging task for video retrieval systems. According to the recent results reached at evaluation campaigns, even respected approaches based on machine learning do not help to solve the task easily in many cases. Hence, in addition to effective automatic multimedia annotation and embedding, interactive search is recommended as well. This paper presents a comprehensive description of an interactive video retrieval framework VIRET that successfully participated at several recent evaluation campaigns. Utilized video analysis, feature extraction and retrieval models are detailed as well as several experiments evaluating effectiveness of selected system components. The results of the prototype at the Video Browser Showdown 2019 are highlighted in connection with an analysis of collected query logs. We conclude that the framework comprise a set of effective and efficient models for most of the evaluated known-item search tasks in 1000 hours of video and could serve as a baseline reference approach. The analysis also reveals that the result presentation interface needs improvements for better performance of future VIRET prototypes.

[1]  Heiko Schuldt,et al.  Deep Learning-Based Concept Detection in vitrivr , 2018, MMM.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Wojciech Matusik,et al.  Large-scale, Fast and Accurate Shot Boundary Detection through Spatio-temporal Convolutional Neural Networks , 2017, ArXiv.

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  Abhinav Gupta,et al.  ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Kai Uwe Barthel,et al.  ImageMap - Visually Browsing Millions of Images , 2015, MMM.

[7]  Ralph Gasser,et al.  Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018 , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[8]  George Awad,et al.  V3C - a Research Video Collection , 2018, MMM.

[9]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Kai Uwe Barthel,et al.  Fusing Keyword Search and Visual Exploration for Untagged Videos , 2018, MMM.

[13]  Jakub Lokoc,et al.  Using an Interactive Video Retrieval Tool for LifeLog Data , 2018, LSC@ICMR.

[14]  Michael Gygli,et al.  Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks , 2017, 2018 International Conference on Content-Based Multimedia Indexing (CBMI).

[15]  Claudio Gennaro,et al.  VISIONE at VBS2019 , 2018, MMM.

[16]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Yiannis Kompatsiaris,et al.  VERGE in VBS 2019 , 2019, MMM.

[19]  Tomáš Souček,et al.  Known-Item Search in Image Datasets Using Automatically Detected Keywords , 2018 .

[20]  Marti A. Hearst Chapter 2 of the second edition of Modern Information Retrieval Renamed Modern Information Retrieval : The Concepts and Technology behind Search , 2011 .

[21]  George Awad,et al.  On Influential Trends in Interactive Video Retrieval: Video Browser Showdown 2015–2017 , 2018, IEEE Transactions on Multimedia.

[22]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Kai Uwe Barthel,et al.  Visually Exploring Millions of Images using Image Maps and Graphs , 2019 .

[24]  Pavel Zezula,et al.  Fusion Strategies for Large-Scale Multi-modal Image Retrieval , 2017, Trans. Large Scale Data Knowl. Centered Syst..

[25]  Minh-Triet Tran,et al.  [Invited papers] Comparing Approaches to Interactive Lifelog Search at the Lifelog Search Challenge (LSC2018) , 2019, ITE Transactions on Media Technology and Applications.

[26]  Klaus Schöffmann,et al.  Autopiloting Feature Maps: The Deep Interactive Video Exploration (diveXplore) System at VBS2019 , 2019, MMM.

[27]  Jakub Lokoc,et al.  TransNet: A deep network for fast detection of common shot transitions , 2019, ArXiv.

[28]  Chong-Wah Ngo,et al.  VIREO @ Video Browser Showdown 2019 , 2019, MMM.

[29]  Jakub Lokoc,et al.  VIRET: A Video Retrieval Tool for Interactive Known-item Search , 2019, ICMR.

[30]  Dumitru Erhan,et al.  Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[32]  Li Li,et al.  A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[33]  Tat-Seng Chua,et al.  Video Browser Showdown by NUS , 2012, MMM.

[34]  Georges Quénot,et al.  TRECVID 2017: Evaluating Ad-hoc and Instance Video Search, Events Detection, Video Captioning and Hyperlinking , 2017, TRECVID.

[35]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[36]  Bart Thomee,et al.  Interactive search in image retrieval: a survey , 2012, International Journal of Multimedia Information Retrieval.

[37]  Rita Cucchiara,et al.  Shot and Scene Detection via Hierarchical Clustering for Re-using Broadcast Video , 2015, CAIP.

[38]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[39]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[40]  Jakub Lokoc,et al.  Video Retrieval with Feature Signature Sketches , 2014, SISAP.

[41]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[42]  Luca Rossetto,et al.  Interactive video search tools: a detailed analysis of the video browser showdown 2015 , 2016, Multimedia Tools and Applications.

[43]  Ricardo Baeza-Yates,et al.  Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .