论文信息 - Semantic Reasoning in Zero Example Video Event Retrieval

Semantic Reasoning in Zero Example Video Event Retrieval

Searching in digital video data for high-level events, such as a parade or a car accident, is challenging when the query is textual and lacks visual example images or videos. Current research in deep neural networks is highly beneficial for the retrieval of high-level events using visual examples, but without examples it is still hard to (1) determine which concepts are useful to pre-train (Vocabulary challenge) and (2) which pre-trained concept detectors are relevant for a certain unseen high-level event (Concept Selection challenge). In our article, we present our Semantic Event Retrieval System which (1) shows the importance of high-level concepts in a vocabulary for the retrieval of complex and generic high-level events and (2) uses a novel concept selection method (i-w2v) based on semantic embeddings. Our experiments on the international TRECVID Multimedia Event Detection benchmark show that a diverse vocabulary including high-level concepts improves performance on the retrieval of high-level events in videos and that our novel method outperforms a knowledge-based concept selection method.

[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[3] Ioannis Patras,et al. Learning to detect video events from zero or very few video examples , 2015, Image Vis. Comput..

[4] Koen E. A. van de Sande,et al. Recommendations for video event recognition using concept vocabularies , 2013, ICMR.

[5] Rong Yan,et al. Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.

[6] Cees Snoek,et al. VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events , 2014, ACM Multimedia.

[7] Omer Levy,et al. Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[8] Claudio Carpineto,et al. A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[9] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.

[10] Ying Liu,et al. A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[11] Chong-Wah Ngo,et al. Event Detection with Zero Example: Select the Right and Suppress the Wrong Concepts , 2016, ICMR.

[12] Ian H. Witten,et al. An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[13] Georges Quénot,et al. TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[14] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[15] Rong Yan,et al. How many high-level concepts will fill the semantic gap in news video retrieval? , 2007, CIVR '07.

[16] Alberto Del Bimbo,et al. Event detection and recognition for semantic annotation of video , 2010, Multimedia Tools and Applications.

[17] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18] James Allan,et al. Zero-shot video retrieval using content and concepts , 2013, CIKM.

[19] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20] Alexander G. Hauptmann,et al. LSCOM Lexicon Definitions and Annotations (Version 1.0) , 2006 .

[21] Shuang Wu,et al. Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Deyu Meng,et al. Bridging the Ultimate Semantic Gap: A Semantic Search Engine for Internet Videos , 2015, ICMR.

[23] Yi Yang,et al. Semantic Concept Discovery for Large-Scale Zero-Shot Event Detection , 2015, IJCAI.

[24] Cees Snoek,et al. Objects2action: Classifying and Localizing Actions without Any Video Example , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25] Dong Liu,et al. EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video , 2015, ACM Multimedia.

[26] Shih-Fu Chang,et al. Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Dong Liu,et al. Event-Driven Semantic Concept Discovery by Exploiting Weakly Tagged Internet Images , 2014, ICMR.

[28] Teruko Mitamura,et al. Zero-Example Event Search using MultiModal Pseudo Relevance Feedback , 2014, ICMR.

[29] Klamer Schutte,et al. Knowledge based query expansion in complex multimedia event detection , 2016, Multimedia Tools and Applications.

[30] Deyu Meng,et al. Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.

[31] Carl Lagoze,et al. Edge dependent pathway scoring for calculating semantic similarity in ConceptNet , 2011, IWCS.

[32] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[33] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[34] David A. Shamma,et al. The New Data and New Challenges in Multimedia Research , 2015, ArXiv.

[35] Jin Zhao,et al. Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[36] Cees Snoek,et al. COSTA: Co-Occurrence Statistics for Zero-Shot Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Chengqi Zhang,et al. Dynamic Concept Composition for Zero-Example Event Detection , 2016, AAAI.

[38] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Alexander G. Hauptmann,et al. Instructional Videos for Unsupervised Harvesting and Learning of Action Examples , 2014, ACM Multimedia.

[40] Masoud Mazloom,et al. Searching informative concept banks for video event detection , 2013, ICMR.

[41] Rong Yan,et al. Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[42] Manesh Kokare,et al. Relevance Feedback in Content Based Image Retrieval: A Review , 2011 .

[43] Katja Hofmann,et al. Assessing concept selection for video retrieval , 2008, MIR '08.

[44] Omer Levy,et al. Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[45] Cees Snoek,et al. Composite Concept Discovery for Zero-Shot Video Event Detection , 2014, ICMR.

[46] Nicu Sebe,et al. Complex Event Detection via Event Oriented Dictionary Learning , 2015, AAAI.

[47] Mubarak Shah,et al. High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.

[48] Xiaojun Chang,et al. Incremental Multimodal Query Construction for Video Search , 2015, ICMR.

[49] Djoerd Hiemstra,et al. Simulating the future of concept-based video retrieval under improved detector performance , 2011, Multimedia Tools and Applications.