Semantic Reasoning in Zero Example Video Event Retrieval

Searching in digital video data for high-level events, such as a parade or a car accident, is challenging when the query is textual and lacks visual example images or videos. Current research in deep neural networks is highly beneficial for the retrieval of high-level events using visual examples, but without examples it is still hard to (1) determine which concepts are useful to pre-train (Vocabulary challenge) and (2) which pre-trained concept detectors are relevant for a certain unseen high-level event (Concept Selection challenge). In our article, we present our Semantic Event Retrieval System which (1) shows the importance of high-level concepts in a vocabulary for the retrieval of complex and generic high-level events and (2) uses a novel concept selection method (i-w2v) based on semantic embeddings. Our experiments on the international TRECVID Multimedia Event Detection benchmark show that a diverse vocabulary including high-level concepts improves performance on the retrieval of high-level events in videos and that our novel method outperforms a knowledge-based concept selection method.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Ioannis Patras,et al.  Learning to detect video events from zero or very few video examples , 2015, Image Vis. Comput..

[4]  Koen E. A. van de Sande,et al.  Recommendations for video event recognition using concept vocabularies , 2013, ICMR.

[5]  Rong Yan,et al.  Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.

[6]  Cees Snoek,et al.  VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events , 2014, ACM Multimedia.

[7]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[8]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[9]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[10]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[11]  Chong-Wah Ngo,et al.  Event Detection with Zero Example: Select the Right and Suppress the Wrong Concepts , 2016, ICMR.

[12]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[13]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[14]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[15]  Rong Yan,et al.  How many high-level concepts will fill the semantic gap in news video retrieval? , 2007, CIVR '07.

[16]  Alberto Del Bimbo,et al.  Event detection and recognition for semantic annotation of video , 2010, Multimedia Tools and Applications.

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  James Allan,et al.  Zero-shot video retrieval using content and concepts , 2013, CIKM.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Alexander G. Hauptmann,et al.  LSCOM Lexicon Definitions and Annotations (Version 1.0) , 2006 .

[21]  Shuang Wu,et al.  Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Deyu Meng,et al.  Bridging the Ultimate Semantic Gap: A Semantic Search Engine for Internet Videos , 2015, ICMR.

[23]  Yi Yang,et al.  Semantic Concept Discovery for Large-Scale Zero-Shot Event Detection , 2015, IJCAI.

[24]  Cees Snoek,et al.  Objects2action: Classifying and Localizing Actions without Any Video Example , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Dong Liu,et al.  EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video , 2015, ACM Multimedia.

[26]  Shih-Fu Chang,et al.  Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Dong Liu,et al.  Event-Driven Semantic Concept Discovery by Exploiting Weakly Tagged Internet Images , 2014, ICMR.

[28]  Teruko Mitamura,et al.  Zero-Example Event Search using MultiModal Pseudo Relevance Feedback , 2014, ICMR.

[29]  Klamer Schutte,et al.  Knowledge based query expansion in complex multimedia event detection , 2016, Multimedia Tools and Applications.

[30]  Deyu Meng,et al.  Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.

[31]  Carl Lagoze,et al.  Edge dependent pathway scoring for calculating semantic similarity in ConceptNet , 2011, IWCS.

[32]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[33]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[34]  David A. Shamma,et al.  The New Data and New Challenges in Multimedia Research , 2015, ArXiv.

[35]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[36]  Cees Snoek,et al.  COSTA: Co-Occurrence Statistics for Zero-Shot Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Chengqi Zhang,et al.  Dynamic Concept Composition for Zero-Example Event Detection , 2016, AAAI.

[38]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Alexander G. Hauptmann,et al.  Instructional Videos for Unsupervised Harvesting and Learning of Action Examples , 2014, ACM Multimedia.

[40]  Masoud Mazloom,et al.  Searching informative concept banks for video event detection , 2013, ICMR.

[41]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[42]  Manesh Kokare,et al.  Relevance Feedback in Content Based Image Retrieval: A Review , 2011 .

[43]  Katja Hofmann,et al.  Assessing concept selection for video retrieval , 2008, MIR '08.

[44]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[45]  Cees Snoek,et al.  Composite Concept Discovery for Zero-Shot Video Event Detection , 2014, ICMR.

[46]  Nicu Sebe,et al.  Complex Event Detection via Event Oriented Dictionary Learning , 2015, AAAI.

[47]  Mubarak Shah,et al.  High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.

[48]  Xiaojun Chang,et al.  Incremental Multimodal Query Construction for Video Search , 2015, ICMR.

[49]  Djoerd Hiemstra,et al.  Simulating the future of concept-based video retrieval under improved detector performance , 2011, Multimedia Tools and Applications.