VIREO-TNO @ TRECVID 2015: Multimedia Event Detection

This paper presents an overview and comparative analysis of our systems designed for the TRECVID 2015 [1] multimedia event detection (MED) task. We submitted 17 runs, of which 5 each for the zeroexample, 10-example and 100-example subtasks for the Pre-Specified (PS) event detection and 2 runs for the 10-example subtask for the Ad-Hoc (AH) event detection. We did not participate in the Interactive Run. This year we focus on three different parts of the MED task: 1) extending the size of our concept bank and combining it with improved dense trajectories; 2) exploring strategies for semantic query generation (SQG); and 3) combining our visual classifiers with audio and/or textual classifiers. Among our 17 submitted runs, the following runs achieved top performances: - VIREO MED15 MED15EvalFull PS 0Ex MED p-manualfused 1: zero-example system with manual SQG, fused with textual (OCR) and speech (ASR) information. - VIREO MED15 MED15EvalFull PS 10Ex MED p-ConceptBankIDTEK0OCR 1: 10-example system using our Concept-Bank feature fused with the improved dense trajectories and the 0Ex manual visual system and OCR. - VIREO MED15 MED15EvalFull PS 100Ex MED c-ConceptBankIDTJointProb 1: 100-example system using Concept-Bank feature fused with the improved dense trajectories, using the joint probability to create the score.

[1]  Zsombor Paroczi,et al.  Re-Ranking the Image Search Results for Relevance and Diversity in MediaEval 2014 Challenge , 2014, MediaEval.

[2]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[4]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[5]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[6]  Deyu Meng,et al.  Bridging the Ultimate Semantic Gap: A Semantic Search Engine for Internet Videos , 2015, ICMR.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Ray Smith An Overview of the Tesseract OCR Engine , 2007 .

[9]  Benoit Huet,et al.  When textual and visual information join forces for multimedia retrieval , 2014, ICMR.

[10]  Paul Over,et al.  Creating HAVIC: Heterogeneous Audio Visual Internet Collection , 2012, LREC.

[11]  Otis Gospodnetic,et al.  Lucene in Action, Second Edition: Covers Apache Lucene 3.0 , 2010 .

[12]  Chong-Wah Ngo,et al.  VIREO-TNO @ TRECVID 2014: Multimedia Event Detection and Recounting (MED and MER) , 2014, TRECVID.

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.