Recent Advances in Video Content Analysis: From Visual Features to Semantic Video Segments

This paper addresses the problem of automatically partitioning a video into semantic segments using visual low-level features only. Semantic segments may be understood as building content blocks of a video with a clear sequential content structure. Examples are reports in a news program, episodes in a movie, scenes of a situation comedy or topic segments of a documentary. In some video genres like news programs or documentaries, the usage of different media (visual, audio, speech, text) may be beneficial or is even unavoidable for reliably detecting the boundaries between semantic segments. In many other genres, however, the pay-off in using different media for the purpose of high-level segmentation is not high. On the one hand, relating the audio, speech or text to the semantic temporal structure of video content is generally very difficult. This is especially so in "acting" video genres like movies and situation comedies. On the other hand, the information contained in the visual stream of these video genres often seems to provide the major clue about the position of semantic segments boundaries. Partitioning a video into semantic segments can be performed by measuring the coherence of the content along neighboring video shots of a sequence. The segment boundaries are then found at places (e.g., shot boundaries) where the values of content coherence are sufficiently low. On the basis of two state-of-the-art techniques for content coherence modeling, we illustrate in this paper the current possibilities for detecting the boundaries of semantic segments using visual low-level features only.

[1]  Wolfgang Effelsberg,et al.  On the detection and recognition of television commercials , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[2]  HongJiang Zhang,et al.  Automatic parsing of TV soccer programs , 1995, Proceedings of the International Conference on Multimedia Computing and Systems.

[3]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[4]  Shingo Uchihashi,et al.  Summarizing video using a shot importance measure and a frame-packing algorithm , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Alberto Del Bimbo,et al.  Retrieval of commercials by video semantics , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[6]  Qian Huang,et al.  Automated semantic structure reconstruction and representation generation for broadcast news , 1998, Electronic Imaging.

[7]  Dragutin Petkovic,et al.  CueVideo: a system for cross-modal search and browse of video databases , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[8]  Andreas Girgensohn,et al.  A genetic algorithm for video segmentation and summarization , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[9]  Nuno Vasconcelos,et al.  A spatiotemporal motion model for video summarization , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[10]  Jeho Nam,et al.  Video abstract of video , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[11]  Alan Hanjalic,et al.  Template-based detection of anchorperson shots in news programs , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[12]  Hideo Hashimoto,et al.  Video indexing using motion vectors , 1992, Other Conferences.

[13]  Takafumi Miyatake,et al.  IMPACT: an interactive natural-motion-picture dedicated multimedia authoring system , 1991, CHI.

[14]  K. R. Ramakrishnan,et al.  Neural net based scene change detection for video classification , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[15]  Xin Liu,et al.  Generating optimal video summaries , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[16]  Yannis Avrithis,et al.  Broadcast news parsing using visual cues: a robust face detection approach , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[17]  Borko Furht,et al.  Video and Image Processing in Multimedia Systems , 1995 .

[18]  Alan Hanjalic,et al.  DANCERS: Delft advanced news retrieval system , 2001, IS&T/SPIE Electronic Imaging.

[19]  Yoshinobu Tonomura,et al.  Projection-detecting filter for video cut detection , 1994, MULTIMEDIA '93.

[20]  Wolfgang Effelsberg,et al.  Automatic recognition of film genres , 1995, MULTIMEDIA '95.

[21]  Akio Nagasaka,et al.  Automatic Video Indexing and Full-Video Search for Object Appearances , 1991, VDB.

[22]  Tsuhan Chen,et al.  Multimedia content classification using motion and audio information , 1997, Proceedings of 1997 IEEE International Symposium on Circuits and Systems. Circuits and Systems in the Information Age ISCAS '97.

[23]  Boon-Lock Yeo,et al.  Video visualization for compact presentation and fast browsing of pictorial content , 1997, IEEE Trans. Circuits Syst. Video Technol..

[24]  Alan Hanjalic,et al.  Optimal shot boundary detection based on robust statistical models , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[25]  Yasuo Ariki,et al.  Extraction of TV news articles based on scene cut detection using DCT clustering , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[26]  Stefan Eickeler,et al.  Content-based video indexing of TV broadcast news using hidden Markov models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[27]  Andreas Girgensohn,et al.  Video classification using transform coefficients , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[28]  Zhu Liu,et al.  Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[29]  Anil K. Jain,et al.  Automatic classification of tennis video for high-level content-based retrieval , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[30]  A. Lippman,et al.  A Bayesian video modeling framework for shot segmentation and content characterization , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[31]  Zhu Liu,et al.  Joint video scene segmentation and classification based on hidden Markov model , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[32]  Alan Hanjalic,et al.  An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis , 1999, IEEE Trans. Circuits Syst. Video Technol..

[33]  Stefanos D. Kollias,et al.  Efficient video summarization based on a fuzzy video content representation , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[34]  Tiecheng Liu,et al.  A hidden Markov model approach to the structure of documentaries , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[35]  Sanjeev R. Kulkarni,et al.  Automated analysis and annotation of basketball video , 1997, Electronic Imaging.

[36]  Nevenka Dimitrova,et al.  Parsing TV programs for identification and removal of nonstory segments , 1998, Electronic Imaging.

[37]  Xin Liu,et al.  Video summarization using singular value decomposition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[38]  Shih-Ping Liou,et al.  Videoabstract: a hybrid approach to generate semantically meaningful video summaries , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[39]  Stanley Boykin,et al.  Improving broadcast news segmentation processing , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[40]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[41]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..