Classification of video events using 4-dimensional time-compressed motion features

Among the various types of semantic concepts modeled, events pose the greatest challenge in terms of computational power needed to represent the event and accuracy that can be achieved in modeling it. We introduce a novel low-level visual feature that summarizes motion in a shot. This feature leverages motion vectors from MPEG-encoded video, and aggregates local motion vectors over time in a matrix, which we refer to as a motion image. The resulting motion image is representative of the overall motion in a video shot, having compressed the temporal dimension while preserving spatial ordering. Building motion models using this feature permits us to combine the power of discriminant modeling with the dynamics of the motion in video shots that cannot be accomplished by building generative models over a time series of motion features from multiple frames in the video shot. Evaluation of models built using several motion image features in the TRECVID 2005 dataset shows that use of this novel motion feature results an average improvement in concept detection performance by 140% over existing motion features. Furthermore, experiments also reveal that when this motion feature is combined with static feature representations of a single keyframe from the shot such as color and texture features, the fused detection results in an improvement between 4 to 12% over the fusion across the static features alone.

[1]  Haim H. Permuter,et al.  IBM Research TREC 2002 Video Retrieval System , 2002, TREC.

[2]  James W. Davis Hierarchical motion history images for recognizing human motion , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[3]  Milind R. Naphade,et al.  Discovering recurrent events in video using unsupervised methods , 2002, Proceedings. International Conference on Image Processing.

[4]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[5]  HongJiang Zhang,et al.  Motion Pattern-Based Video Classification and Retrieval , 2003, EURASIP J. Adv. Signal Process..

[6]  Jing Huang,et al.  Spatial Color Indexing and Applications , 2004, International Journal of Computer Vision.

[7]  Jack Bresenham,et al.  Algorithm for computer control of a digital plotter , 1965, IBM Syst. J..

[8]  Milind R. Naphade,et al.  Supporting audiovisual query using dynamic programming , 2001, MULTIMEDIA '01.

[9]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[10]  Marcel Worring,et al.  Multimedia event-based video indexing using time intervals , 2005, IEEE Transactions on Multimedia.

[11]  Neill W Campbell,et al.  Proceedings of the TRECVID 2006 Workshop , 2006 .

[12]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[13]  Cheng Lu,et al.  Classification of summarized videos using hidden markov models on compressed chromaticity signatures , 2001, MULTIMEDIA '01.

[14]  Lihi Zelnik-Manor,et al.  Statistical analysis of dynamic actions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Bernd Freisleben,et al.  University of Marburg at TRECVID 2005: Shot Boundary Detection and Camera Motion Estimation Results , 2005, TRECVID.