Video shot classification with concept detection

It is a challenging work to classify video shots into a predefined genre set according to their semantic contents, which is helpful to video indexing, summarization and retrieval. This research proposes a novel shot classification algorithm with concept detection for news video programs. Six semantic shot types are studied and categorized: Anchorperson, Monologue, Reporter, Commercial, Still image and Miscellaneous, in which anchorperson shots are detected by clustering methods, reporter and monologue shots are distinguished by Conditional Random Fields (CRFs), and the last three categories are picked out by rule-based methods. Multimodality features are employed, such as visual, audio, face, temporal and contextual features. The experimental results show its effectiveness and achieve a high average accuracy of 96.5%.

[1]  A. Kosmala,et al.  A New Approach To Content-Based Video Indexing Using Hidden Markov Models , 1997 .

[2]  Yang Wang,et al.  A dynamic conditional random field model for object segmentation in image sequences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Yannis Avrithis,et al.  Broadcast news parsing using visual cues: a robust face detection approach , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[4]  Yuting Su,et al.  Anchorperson Shot Detection in MPEG Domain , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[5]  Zhi-Qiang Liu,et al.  Investigation on unsupervised clustering algorithms for video shot categorization , 2007, Soft Comput..

[6]  Masaru Sugano,et al.  Shot genre classification using compressed audio-visual features , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[7]  Michael J. Witbrock,et al.  Story segmentation and detection of commercials in broadcast news video , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[8]  Gu Xu,et al.  An HMM-based framework for video semantic analysis , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Shih-Fu Chang,et al.  A highly efficient system for automatic face region detection in MPEG video , 1997, IEEE Trans. Circuits Syst. Video Technol..

[10]  Tao Wang,et al.  Semantic Event Detection using Conditional Random Fields , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[11]  Anil K. Jain,et al.  Face Detection in Color Images , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Tat-Seng Chua,et al.  Detection of human faces in a compressed domain for video stratification , 2002, The Visual Computer.

[13]  Hiroshi Murase,et al.  Assembling personal speech collections by monologue scene detection from a news video archive , 2006, MIR '06.

[14]  Tat-Seng Chua,et al.  The Segmentation and Classification of Story Boundaries in News Video , 2002, VDB.

[15]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.