Event Recognition Based on Top-Down Motion Attention

How to fuse static and dynamic information is a key issue in event analysis. In this paper, a top-down motion guided fusing method is proposed for recognizing events in an unconstrained news video. In the method, the static information is represented as a Bag-of-SIFT-features and motion information is employed to generate event specific attention map to direct the sampling of the interest points. We build class-specific motion histograms for each event so as to give more weight on the interest points that are discriminative to the corresponding event. Experimental results on TRECVID 2005 video corpus demonstrate that the proposed method can improve the mean average accuracy of recognition.

[1]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[2]  Chong-Wah Ngo,et al.  Video event detection using motion relativity and visual relatedness , 2008, ACM Multimedia.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[5]  Fahad Shahbaz Khan,et al.  Top-down color attention for object recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Shuicheng Yan,et al.  SIFT-Bag kernel for video event analysis , 2008, ACM Multimedia.

[7]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Dong Xu,et al.  Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[10]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .