Video event recognition using concept attributes

We propose to use action, scene and object concepts as semantic attributes for classification of video events in InTheWild content, such as YouTube videos. We model events using a variety of complementary semantic attribute features developed in a semantic concept space. Our contribution is to systematically demonstrate the advantages of this concept-based event representation (CBER) in applications of video event classification and understanding. Specifically, CBER has better generalization capability, which enables to recognize events with a few training examples. In addition, CBER makes it possible to recognize a novel event without training examples (i.e., zero-shot learning). We further show our proposed enhanced event model can further improve the zero-shot learning. Furthermore, CBER provides a straightforward way for event recounting/understanding. We use the TRECVID Multimedia Event Detection (MED11) open source event definitions and datasets as our test bed and show results on over 1400 hours of videos.

[1]  David C. Hogg,et al.  Learning the distribution of object trajectories for event recognition , 1996, Image Vis. Comput..

[2]  Ramakant Nevatia,et al.  Video-based event recognition: activity representation and probabilistic recognition methods , 2004, Comput. Vis. Image Underst..

[3]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Shahram Ebadollahi,et al.  Visual Event Detection using Multi-Dimensional Concept Dynamics , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[5]  Manuele Bicego,et al.  Audio-Visual Event Recognition in Surveillance Video Sequences , 2007, IEEE Transactions on Multimedia.

[6]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jiebo Luo,et al.  Event recognition: viewing the world with a third eye , 2008, ACM Multimedia.

[8]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Jiebo Luo,et al.  Event recognition from photo collections via PageRank , 2009, MM '09.

[12]  Gang Wang,et al.  Joint learning of visual attributes, object classes and visual saliency , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[14]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Bernt Schiele,et al.  What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Mubarak Shah,et al.  Columbia-UCF TRECVID2010 Multimedia Event Detection: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching , 2010, TRECVID.

[17]  Ivor W. Tsang,et al.  Visual Event Recognition in Videos by Learning from Web Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Nazli Ikizler-Cinbis,et al.  Object, Scene and Actions: Combining Multiple Features for Human Action Recognition , 2010, ECCV.

[19]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[20]  Antonio Torralba,et al.  Semantic Label Sharing for Learning with Many Categories , 2010, ECCV.

[21]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[22]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[23]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.

[24]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[25]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[26]  Larry S. Davis,et al.  Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.

[27]  Nicu Sebe,et al.  (Unseen) event recognition via semantic compositionality , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Hui Cheng,et al.  Evaluation of low-level features and their combinations for complex event detection in open source videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[30]  Mubarak Shah,et al.  Learning semantic features for action recognition via diffusion maps , 2012, Comput. Vis. Image Underst..