TRECVID 2006 Rushes Exploitation by CAS MCG

In our rushes exploitation task of TRECVID 2006, we propose a novel and interactive rushes video selection and editing method based on hierarchical browsing of key frames, where high level features of each key frame such as face, interview, person, crowd, building, outdoor, waterbody, and other information about redundancy and repetition are displayed at same time for helping editors to select what they really want. During high level feature extraction, we propose a multi-modal interview detection method based on audio classification and face detection, and a new repetition detection method based on spatio-temporal slice. We also detect some concepts such as crowd, building, outdoor, waterbody based on SVM classifiers. Additionally, we characterize rushes by categorizing camera motion for inferring intention. Due to the difficulty of high level feature extraction and the diversity of editor’s requirements, our hierarchical browsing method along with extracted information may be a good choice for rushes exploitation.

[1]  B. Vasudev,et al.  Spatiotemporal sequence matching for efficient video copy detection , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Mubarak Shah,et al.  Detection and representation of scenes in videos , 2005, IEEE Transactions on Multimedia.

[3]  Songyang Lao,et al.  Feature analysis and extraction for audio automatic classification , 2005, SMC.

[4]  Wei Xiong,et al.  Query by video clip , 1999, Multimedia Systems.

[5]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Ruud M. Bolle,et al.  Comparison of sequence matching techniques for video copy detection , 2001, IS&T/SPIE Electronic Imaging.

[7]  Chun Chen,et al.  Subspace analysis and optimization for AAM based face alignment , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[8]  Chee Sun Won,et al.  Efficient use of local edge histogram descriptor , 2000, MULTIMEDIA '00.

[9]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[10]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[11]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[12]  Bradley P. Allen,et al.  Searching for Relevant Video Shots in BBC Rushes Using Semantic Web Techniques , 2005, TRECVID.

[13]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  S.-L. Peng,et al.  Interpretation of image sequences by spatio-temporal analysis , 1989, [1989] Proceedings. Workshop on Visual Motion.

[15]  Paul Over,et al.  TRECVID 2005 - An Overview , 2005, TRECVID.

[16]  Ahmed K. Elmagarmid,et al.  InsightVideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval , 2005, IEEE Transactions on Multimedia.

[17]  Shree K. Nayar,et al.  Ordinal Measures for Image Correspondence , 1998, IEEE Trans. Pattern Anal. Mach. Intell..