论文信息 - VIREO-TNO @ TRECVID 2015: Multimedia Event Detection

VIREO-TNO @ TRECVID 2015: Multimedia Event Detection

This paper presents an overview and comparative analysis of our systems designed for the TRECVID 2015 [1] multimedia event detection (MED) task. We submitted 17 runs, of which 5 each for the zeroexample, 10-example and 100-example subtasks for the Pre-Speciﬁed (PS) event detection and 2 runs for the 10-example subtask for the Ad-Hoc (AH) event detection. We did not participate in the Interactive Run. This year we focus on three diﬀerent parts of the MED task: 1) extending the size of our concept bank and combining it with improved dense trajectories; 2) exploring strategies for semantic query generation (SQG); and 3) combining our visual classiﬁers with audio and/or textual classiﬁers. Among our 17 submitted runs, the following runs achieved top performances: - VIREO MED15 MED15EvalFull PS 0Ex MED p-manualfused 1: zero-example system with manual SQG, fused with textual (OCR) and speech (ASR) information. - VIREO MED15 MED15EvalFull PS 10Ex MED p-ConceptBankIDTEK0OCR 1: 10-example system using our Concept-Bank feature fused with the improved dense trajectories and the 0Ex manual visual system and OCR. - VIREO MED15 MED15EvalFull PS 100Ex MED c-ConceptBankIDTJointProb 1: 100-example system using Concept-Bank feature fused with the improved dense trajectories, using the joint probability to create the score.

[1] Zsombor Paroczi,et al. Re-Ranking the Image Search Results for Relevance and Diversity in MediaEval 2014 Challenge , 2014, MediaEval.

[2] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[3] Georges Quénot,et al. TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[4] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[5] Christopher D. Manning,et al. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[6] Deyu Meng,et al. Bridging the Ultimate Semantic Gap: A Semantic Search Engine for Internet Videos , 2015, ICMR.

[7] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8] Ray Smith. An Overview of the Tesseract OCR Engine , 2007 .

[9] Benoit Huet,et al. When textual and visual information join forces for multimedia retrieval , 2014, ICMR.

[10] Paul Over,et al. Creating HAVIC: Heterogeneous Audio Visual Internet Collection , 2012, LREC.

[11] Otis Gospodnetic,et al. Lucene in Action, Second Edition: Covers Apache Lucene 3.0 , 2010 .

[12] Chong-Wah Ngo,et al. VIREO-TNO @ TRECVID 2014: Multimedia Event Detection and Recounting (MED and MER) , 2014, TRECVID.

[13] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14] Marti A. Hearst. Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.