Logical unit and scene detection: a comparative survey

Logical units are semantic video segments above the shot level. Depending on the common semantics within the unit and data domain, different types of logical unit extraction algorithms have been presented in literature. Topic units are typically extracted for documentaries or news broadcasts while scenes are extracted for narrative-driven video such as feature films, sitcoms, or cartoons. Other types of logical units are extracted from home video and sports. Different algorithms in literature used for the extraction of logical units are reviewed in this paper based on the categories unit type, data domain, features used, segmentation method, and thresholds applied. A detailed comparative study is presented for the case of extracting scenes from narrative-driven video. While earlier comparative studies focused on scene segmentation methods only or on complete news-story segmentation algorithms, in this paper various visual features and segmentation methods with their thresholding mechanisms and their combination into complete scene detection algorithms are investigated. The performance of the resulting large set of algorithms is then evaluated on a set of video files including feature films, sitcoms, children's shows, a detective story, and cartoons.

[1]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[2]  Mubarak Shah,et al.  Detection and representation of scenes in videos , 2005, IEEE Transactions on Multimedia.

[3]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[4]  Rainer Lienhart,et al.  Comparison of automatic shot boundary detection algorithms , 1998, Electronic Imaging.

[5]  Patrick Bouthemy,et al.  From video shot clustering to sequence segmentation , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[6]  Ahmed K. Elmagarmid,et al.  InsightVideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval , 2005, IEEE Transactions on Multimedia.

[7]  Alan Hanjalic,et al.  Shot-boundary detection: unraveled and resolved? , 2002, IEEE Trans. Circuits Syst. Video Technol..

[8]  Rainer Lienhart,et al.  Scene Determination Based on Video and Audio Features , 2004, Multimedia Tools and Applications.

[9]  Christian Petersohn Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection System , 2004, TRECVID.

[10]  Stephen McCarthy,et al.  The Filmmaker's Handbook: A Comprehensive Guide for the Digital Age , 1999 .

[11]  Joseph V. Mascelli The five C's of cinematography : motion picture filming techniques simplified , 1965 .

[12]  Christian Petersohn Improving scene detection by using gradual shot transitions as cues from film grammar , 2008, Electronic Imaging.

[13]  Ying Li,et al.  Instructional Video Content Analysis Using Audio Information , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Keiichiro Hoashi,et al.  Shot Boundary Determination on MPEC Compressed Domain and Story Segmentation Experiments for TRECVID 2003 , 2003, TRECVID.

[15]  Chong-Wah Ngo,et al.  Video summarization and scene detection by graph modeling , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  HongJiang Zhang,et al.  Video scene extraction by force competition , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[17]  George A. Miller,et al.  WordNet: A Lexical Database for the English Language , 2002 .

[18]  C. Petersohn Sub-Shots - Basic Units of Video , 2007, 2007 14th International Workshop on Systems, Signals and Image Processing and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services.

[19]  Wallapak Tavanapong,et al.  Shot clustering techniques for story browsing , 2004, IEEE Transactions on Multimedia.

[20]  Shih-Fu Chang,et al.  Discovery and fusion of salient multimodal features toward news story segmentation , 2003, IS&T/SPIE Electronic Imaging.

[21]  Loong Fah Cheong,et al.  Parsing video programs into individual segments using FSA modeling , 2002, Proceedings. International Conference on Image Processing.

[22]  Ba Tu Truong,et al.  Neighborhood coherence and edge based approaches to film scene extraction , 2002, Object recognition supported by user interaction for service robots.

[23]  Patrick Gros,et al.  Temporal structure analysis of broadcast tennis video using hidden Markov models , 2003, IS&T/SPIE Electronic Imaging.

[24]  Mubarak Shah,et al.  Scene detection in Hollywood movies and TV shows , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[25]  Nevenka Dimitrova,et al.  TEMPORAL VIDEO BOUNDARIES , 2003 .

[26]  Lei Chen,et al.  Rule-based scene extraction from video , 2002, Proceedings. International Conference on Image Processing.

[27]  HongJiang Zhang,et al.  Automatic video scene extraction by shot grouping , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[28]  Thomas S. Huang,et al.  Constructing table-of-content for videos , 1999, Multimedia Systems.

[29]  Alexander C. Loui,et al.  Finding structure in home videos by probabilistic hierarchical clustering , 2003, IEEE Trans. Circuits Syst. Video Technol..

[30]  Shin'ichi Satoh,et al.  Topic-based inter-video structuring of a large-scale news video corpus , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[31]  Lide Wu,et al.  An integrated correlation measure for semantic video segmentation , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[32]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[33]  Lie Lu,et al.  Optimization-based automated home video editing system , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  Shih-Fu Chang,et al.  Determining computable scenes in films and their structures using audio-visual memory models , 2000, ACM Multimedia.

[35]  Philippe Aigrain,et al.  Medium knowledge-based macro-segmentation of video into sequences , 1997 .

[36]  Marcel Worring,et al.  Model based interactive story unit segmentation , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[37]  Lei Chen,et al.  Incorporating Audio Cues into Dialog and Action Scene Extraction , 2003, IS&T/SPIE Electronic Imaging.

[38]  Marcel Worring,et al.  Systematic evaluation of logical story unit segmentation , 2002, IEEE Trans. Multim..

[39]  A. Murat Tekalp,et al.  Hierarchical temporal video segmentation and content characterization , 1997, Other Conferences.

[40]  Gerhard Rigoll,et al.  New approaches to audio-visual segmentation of TV news for automatic topic retrieval , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[41]  Ce Wang,et al.  Automatic story segmentation of news video based on audio-visual features and text information , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).