Few-Example Video Event Retrieval using Tag Propagation

An emerging topic in multimedia retrieval is to detect a complex event in video using only a handful of video examples. Different from existing work which learns a ranker from positive video examples and hundreds of negative examples, we aim to query web video for events using zero or only a few visual examples. To that end, we propose in this paper a tag-based video retrieval system which propagates tags from a tagged video source to an unlabeled video collection without the need of any training examples. Our algorithm is based on weighted frequency neighbor voting using concept vector similarity. Once tags are propagated to unlabeled video we can rely on off-the-shelf language models to rank these videos by the tag similarity. We study the behavior of our tag-based video event retrieval system by performing three experiments on web videos from the TRECVID multimedia event detection corpus, with zero, one and multiple query examples that beats a recent alternative.

[1]  Koen E. A. van de Sande,et al.  Recommendations for video event recognition using concept vocabularies , 2013, ICMR.

[2]  Shuang Wu,et al.  Multimodal feature fusion for robust event detection in web videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[4]  Mark Sanderson,et al.  Automatic video tagging using content redundancy , 2009, SIGIR.

[5]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[6]  Alberto Del Bimbo,et al.  Enriching and localizing semantic tags in internet videos , 2011, ACM Multimedia.

[7]  Masoud Mazloom,et al.  Querying for video events by semantic signatures from few examples , 2013, MM '13.

[8]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[9]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[10]  Wei Liu,et al.  Double Fusion for Multimedia Event Detection , 2012, MMM.

[11]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[13]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.