Encoding Concept Prototypes for Video Event Detection and Summarization

This paper proposes a new semantic video representation for few and zero example event detection and unsupervised video event summarization. Different from existing works, which obtain a semantic representation by training concepts over images or entire video clips, we propose an algorithm that learns a set of relevant frames as the concept prototypes from web video examples, without the need for frame-level annotations, and use them for representing an event video. We formulate the problem of learning the concept prototypes as seeking the frames closest to the densest region in the feature space of video frames from both positive and negative training videos of a target concept. We study the behavior of our video event representation based on concept prototypes by performing three experiments on challenging web videos from the TRECVID 2013 multimedia event detection task and the MED-summaries dataset. Our experiments establish that i) Event detection accuracy increases when mapping each video into concept prototype space. ii) Zero-example event detection increases by analyzing each frame of a video individually in concept prototype space, rather than considering the holistic videos. iii) Unsupervised video event summarization using concept prototypes is more accurate than using video-level concept detectors.

[1]  Nicu Sebe,et al.  We are not equally negative: fine-grained labeling for multimedia event detection , 2013, ACM Multimedia.

[2]  Masoud Mazloom,et al.  Conceptlets: Selective Semantics for Classifying Video Events , 2014, IEEE Transactions on Multimedia.

[3]  Ming-Syan Chen,et al.  Video Event Detection by Inferring Temporal Instance Labels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[5]  Nuno Vasconcelos,et al.  Dynamic Pooling for Complex Event Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Nicu Sebe,et al.  Complex Event Detection via Multi-source Video Attributes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Masoud Mazloom,et al.  Querying for video events by semantic signatures from few examples , 2013, MM '13.

[8]  Hui Cheng,et al.  Video event recognition using concept attributes , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[9]  Dong Liu,et al.  Recognizing Complex Events in Videos by Learning Key Static-Dynamic Evidences , 2014, ECCV.

[10]  Cees Snoek,et al.  Recommendations for recognizing video events by concept vocabularies , 2014, Comput. Vis. Image Underst..

[11]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[12]  Dong Liu,et al.  Event-Driven Semantic Concept Discovery by Exploiting Weakly Tagged Internet Images , 2014, ICMR.

[13]  Florian Metze,et al.  Beyond audio and video retrieval: towards multimedia summarization , 2012, ICMR.

[14]  Teruko Mitamura,et al.  Zero-Example Event Search using MultiModal Pseudo Relevance Feedback , 2014, ICMR.

[15]  Cees Snoek,et al.  VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events , 2014, ACM Multimedia.

[16]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[17]  R. Manmatha,et al.  Modeling Concept Dependencies for Event Detection , 2014, ICMR.

[18]  Ramakant Nevatia,et al.  DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[20]  Mubarak Shah,et al.  Recognition of Complex Events: Exploiting Temporal Dynamics between Underlying Concepts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Jelena Tesic,et al.  Multimedia Event Detection (MED) Evaluation Task , 2010, TRECVID.

[24]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[25]  Xirong Li,et al.  Few-Example Video Event Retrieval using Tag Propagation , 2014, ICMR.

[26]  Oded Maron,et al.  Learning from Ambiguity , 1998 .

[27]  Yiannis Kompatsiaris,et al.  High-level event detection in video exploiting discriminant concepts , 2011, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI).

[28]  Dong Liu,et al.  Building A Large Concept Bank for Representing Events in Video , 2014, ArXiv.

[29]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[30]  Cees Snoek,et al.  Composite Concept Discovery for Zero-Shot Video Event Detection , 2014, ICMR.

[31]  Mubarak Shah,et al.  Recognizing Complex Events Using Large Margin Joint Low-Level Event Model , 2012, ECCV.