Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018

This work summarizes the findings of the 7th iteration of the Video Browser Showdown (VBS) competition organized as a workshop at the 24th International Conference on Multimedia Modeling in Bangkok. The competition focuses on video retrieval scenarios in which the searched scenes were either previously observed or described by another person (i.e., an example shot is not available). During the event, nine teams competed with their video retrieval tools in providing access to a shared video collection with 600 hours of video content. Evaluation objectives, rules, scoring, tasks, and all participating tools are described in the article. In addition, we provide some insights into how the different teams interacted with their video browsers, which was made possible by a novel interaction logging mechanism introduced for this iteration of the VBS. The results collected at the VBS evaluation server confirm that searching for one particular scene in the collection when given a limited time is still a challenging task for many of the approaches that were showcased during the event. Given only a short textual description, finding the correct scene is even harder. In ad hoc search with multiple relevant scenes, the tools were mostly able to find at least one scene, whereas recall was the issue for many teams. The logs also reveal that even though recent exciting advances in machine learning narrow the classical semantic gap problem, user-centric interfaces are still required to mediate access to specific content. Finally, open challenges and lessons learned are presented for future VBS events.

[1]  Kai Uwe Barthel,et al.  Visually Exploring Millions of Images using Image Maps and Graphs , 2019 .

[2]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Georges Quénot,et al.  TRECVID 2017: Evaluating Ad-hoc and Instance Video Search, Events Detection, Video Captioning and Hyperlinking , 2017, TRECVID.

[4]  Heiko Schuldt,et al.  vitrivr: A Flexible Retrieval Stack Supporting Multiple Query Modes for Searching in Multimedia Collections , 2016, ACM Multimedia.

[5]  Jakub Lokoc,et al.  Using an Interactive Video Retrieval Tool for LifeLog Data , 2018, LSC@ICMR.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Junwei Han,et al.  DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[10]  Yi Yang,et al.  Interactive Video Indexing With Statistical Active Learning , 2012, IEEE Transactions on Multimedia.

[11]  Chong-Wah Ngo,et al.  Concept-Based Interactive Search System , 2017, MMM.

[12]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Vinh-Tiep Nguyen,et al.  Video Search Based on Semantic Extraction and Locally Regional Object Proposal , 2018, MMM.

[15]  Mohammad Soleymani,et al.  The Benchmarking Initiative for Multimedia Evaluation: MediaEval 2016 , 2017, IEEE Multim..

[16]  Heiko Schuldt,et al.  Cineast: A Multi-feature Sketch-Based Video Retrieval Engine , 2014, 2014 IEEE International Symposium on Multimedia.

[17]  George Awad,et al.  On Influential Trends in Interactive Video Retrieval: Video Browser Showdown 2015–2017 , 2018, IEEE Transactions on Multimedia.

[18]  Jakub Lokoc,et al.  Revisiting SIRET Video Retrieval Tool , 2018, MMM.

[19]  Frank Hopfgartner,et al.  Video browsing interfaces and applications: a review , 2010 .

[20]  Otis Gospodnetic,et al.  Lucene in Action, Second Edition: Covers Apache Lucene 3.0 , 2010 .

[21]  Heiko Schuldt,et al.  ADAMpro: Database Support for Big Multimedia Retrieval , 2016, Datenbank-Spektrum.

[22]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[24]  Marcel Worring,et al.  Where Is the User in Multimedia Retrieval? , 2012, IEEE Multim..

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Klaus Schöffmann,et al.  The ITEC Collaborative Video Search System at the Video Browser Showdown 2018 , 2018, MMM.

[27]  Dumitru Erhan,et al.  Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Heiko Schuldt,et al.  Competitive Video Retrieval with vitrivr , 2018, MMM.

[30]  Kai Uwe Barthel,et al.  Graph-Based Browsing for Large Video Collections , 2015, MMM.

[31]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[32]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[33]  Klaus Schöffmann,et al.  A User-Centric Media Retrieval Competition: The Video Browser Showdown 2012-2014 , 2014, IEEE Multim..

[34]  Marti A. Hearst,et al.  The state of the art in automating usability evaluation of user interfaces , 2001, CSUR.

[35]  Klaus Schöffmann,et al.  Sketch-Based Similarity Search for Collaborative Feature Maps , 2018, MMM.

[36]  Klaus Schöffmann,et al.  Collaborative Feature Maps for Interactive Video Search , 2017, MMM.

[37]  Luca Rossetto,et al.  Interactive video search tools: a detailed analysis of the video browser showdown 2015 , 2016, Multimedia Tools and Applications.

[38]  Kai Uwe Barthel,et al.  Navigating a Graph of Scenes for Exploring Large Video Collections , 2016, MMM.

[39]  Min Chen,et al.  Video Semantic Event/Concept Detection Using a Subspace-Based Multimedia Data Mining Framework , 2008, IEEE Transactions on Multimedia.

[40]  Rafal Kuc,et al.  Mastering ElasticSearch , 2013 .

[41]  Klaus Schöffmann,et al.  Video Interaction Tools , 2015, ACM Comput. Surv..

[42]  Sanparith Marukatat,et al.  Sloth Search System , 2018, MMM.

[43]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[44]  Chong-Wah Ngo,et al.  Enhanced VIREO KIS at VBS 2018 , 2018, MMM.

[45]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[46]  Claudio Gennaro,et al.  Searching and annotating 100M Images with YFCC100M-HNfc6 and MI-File , 2017, CBMI.

[47]  Marcel Worring,et al.  Multimedia event-based video indexing using time intervals , 2005, IEEE Transactions on Multimedia.