论文信息 - Interactive Exploration of Journalistic Video Footage through Multimodal Semantic Matching

Interactive Exploration of Journalistic Video Footage through Multimodal Semantic Matching

This demo presents a system for journalists to explore video footage for broadcasts. Daily news broadcasts contain multiple news items that consist of many video shots and searching for relevant footage is a labor intensive task. Without the need for annotated video shots, our system extracts semantics from footage and automatically matches these semantics to query terms from the journalist. The journalist can then indicate which aspects of the query term need to be emphasized, e.g. the title or its thematic meaning. The goal of this system is to support the journalists in their search process by encouraging interaction and exploration with the system.

[1] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.

[2] Heiko Schuldt,et al. Multimodal Video Retrieval with the 2017 IMOTION System , 2017, ICMR.

[3] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Matthijs Douze,et al. FastText.zip: Compressing text classification models , 2016, ArXiv.

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] R. Smith,et al. An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[7] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Maria Eskevich,et al. Hyper Video Browser: Search and Hyperlinking in Broadcast Media , 2015, ACM Multimedia.

[9] Petr Sojka,et al. Software Framework for Topic Modelling with Large Corpora , 2010 .

[10] Deyu Meng,et al. Bridging the Ultimate Semantic Gap: A Semantic Search Engine for Internet Videos , 2015, ICMR.

[11] Stephen E. Robertson,et al. A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[12] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.