An overview of video shot clustering and summarization techniques for mobile applications

The problem of content characterization of video programmes is of great interest because video appeals to large audiences and its efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper we analyze several techniques proposed in literature for content characterization of video programmes, including movies and sports, that could be helpful for mobile media consumption. In particular we focus our analysis on shot clustering methods and effective video summarization techniques since, in the current video analysis scenario, they facilitate the access to the content and help in quick understanding of the associated semantics. First we consider the shot clustering techniques based on low-level features, using visual, audio and motion information, even combined in a multi-modal fashion. Then we concentrate on summarization techniques, such as static storyboards, dynamic video skimming and the extraction of sport highlights. Discussed summarization methods can be employed in the development of tools that would be greatly useful to most mobile users: in fact these algorithms automatically shorten the original video while preserving most events by highlighting only the important content. The effectiveness of each approach has been analyzed, showing that it mainly depends on the kind of video programme it relates to, and the type of summary or highlights we are focusing on.

[1]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[2]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[3]  Riccardo Leonardi,et al.  Semantic video indexing using MPEG motion vectors , 2000, 2000 10th European Signal Processing Conference.

[4]  Willem Jonker,et al.  Recognizing Strokes in Tennis Videos using Hidden Markov Models , 2001, VIIP.

[5]  Riccardo Leonardi,et al.  Event recognition in sport programs using low-level motion indices , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[6]  Anil K. Jain,et al.  Automatic classification of tennis video for high-level content-based retrieval , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[7]  Richard J. Qian,et al.  Detecting semantic events in soccer games: towards a complete solution , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[8]  Avideh Zakhor,et al.  Content analysis of video using principal components , 1998, IEEE Trans. Circuits Syst. Video Technol..

[9]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  HongJiang Zhang,et al.  Automatic parsing of TV soccer programs , 1995, Proceedings of the International Conference on Multimedia Computing and Systems.

[11]  Peter J. L. van Beek,et al.  Detection of slow-motion replay segments in sports video for highlights generation , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[12]  Shih-Fu Chang,et al.  The holy grail of content-based media analysis , 2002 .

[13]  Lifeng Sun,et al.  Joint Inter and Intra Shot Modeling for Spectral Video Shot Clustering , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[14]  Boon-Lock Yeo,et al.  Analysis And Presentation Of Soccer Highlights From Digital Video , 1995 .

[15]  Yoshinao Aoki,et al.  Indexing of baseball telecast for content-based video retrieval , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[16]  Aggelos K. Katsaggelos,et al.  MINMAX optimal video summarization , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[18]  Xin Liu,et al.  Video summarization and retrieval using singular value decomposition , 2003, Multimedia Systems.

[19]  Sanjeev R. Kulkarni,et al.  Automated analysis and annotation of basketball video , 1997, Electronic Imaging.

[20]  Jean-Marc Odobez,et al.  Video Shot Clustering using Spectral Methods , 2003 .

[21]  Riccardo Leonardi,et al.  Semantic Indexing of Multimedia Documents , 2002, IEEE Multim..

[22]  Mei Han,et al.  Extract highlights from baseball game video with hidden Markov models , 2002, Proceedings. International Conference on Image Processing.

[23]  Alan Hanjalic,et al.  Shot-boundary detection: unraveled and resolved? , 2002, IEEE Trans. Circuits Syst. Video Technol..

[24]  Riccardo Leonardi,et al.  A Markov chain model for semantic indexing of sport program sequences. , 2003 .

[25]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[26]  Xavier Binefa,et al.  An EM algorithm for video summarization, generative model approach , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[27]  Xin Liu,et al.  Video summarization using singular value decomposition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[28]  N. Vincent,et al.  3 classes segmentation for analysis of football audio sequences , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[29]  Patrick Gros,et al.  Hierarchical structure analysis of sport videos using HMMS , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[30]  Alexander C. Loui,et al.  Consumer video structuring by probabilistic merging of video segments , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[31]  David G. Stork,et al.  Pattern Classification , 1973 .

[32]  Andreas Girgensohn,et al.  Time-Constrained Keyframe Selection Technique , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[33]  Riccardo Leonardi,et al.  Modelling of visual features by Markov chains for sport content characterization , 2002, 2002 11th European Signal Processing Conference.

[34]  Nuno Vasconcelos,et al.  A spatiotemporal motion model for video summarization , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[35]  Brett Adams Where does computational media aesthetics fit , 2003 .

[36]  Shih-Fu Chang,et al.  Structure analysis of sports video using domain models , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[37]  Rainer Lienhart Dynamic video summarization of home video , 1999, Electronic Imaging.

[38]  A. Murat Tekalp,et al.  Automatic soccer video analysis and summarization , 2003, IEEE Trans. Image Process..

[39]  Milan Petkovic,et al.  Multi-modal extraction of highlights from TV Formula 1 programs , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[40]  Neill Campbell,et al.  Comic-like Layout of Video Summaries , 2006 .

[41]  Sang Uk Lee,et al.  Efficient video indexing scheme for content-based retrieval , 1999, IEEE Trans. Circuits Syst. Video Technol..

[42]  Regunathan Radhakrishnan,et al.  Motion activity-based extraction of key-frames from video shots , 2002, Proceedings. International Conference on Image Processing.

[43]  Stefanos D. Kollias,et al.  Video content representation using optimal extraction of frames and scenes , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[44]  Noboru Babaguchi,et al.  Video Summarization for Large Sports Video Archives , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[45]  David S. Doermann,et al.  Video summarization by curve simplification , 1998, MULTIMEDIA '98.

[46]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[47]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[48]  Vojkan Mihajlovic,et al.  Automatic Annotation of Formula 1 Races for Content-Based Video Retrieval , 2001 .

[49]  Yanjun Qi,et al.  Supervised classification for video shot segmentation , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[50]  Andreas Girgensohn,et al.  Stained-glass visualization for highly condensed video summaries , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[51]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[52]  Shih-Fu Chang,et al.  Semantic video clustering across sources using bipartite spectral clustering , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[53]  Irena Koprinska,et al.  Temporal video segmentation: A survey , 2001, Signal Process. Image Commun..

[54]  C.-C. Jay Kuo,et al.  Rule-based video classification system for basketball video indexing , 2000, MULTIMEDIA '00.

[55]  Shih-Fu Chang,et al.  Structure analysis of soccer video with hidden Markov models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[56]  Shih-Fu Chang,et al.  A utility framework for the automatic generation of audio-visual skims , 2002, MULTIMEDIA '02.

[57]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[58]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[59]  Jeho Nam,et al.  Dynamic video summarization and visualization , 1999, MULTIMEDIA '99.

[60]  Alan Hanjalic,et al.  An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis , 1999, IEEE Trans. Circuits Syst. Video Technol..

[61]  Alexander C. Loui,et al.  Finding structure in home videos by probabilistic hierarchical clustering , 2003, IEEE Trans. Circuits Syst. Video Technol..

[62]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[63]  Shih-Fu Chang,et al.  Determining computable scenes in films and their structures using audio-visual memory models , 2000, ACM Multimedia.

[64]  Boon-Lock Yeo,et al.  Video visualization for compact presentation and fast browsing of pictorial content , 1997, IEEE Trans. Circuits Syst. Video Technol..

[65]  Shih-Fu Chang,et al.  Algorithms and system for segmentation and structure analysis in soccer video , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[66]  Chong-Wah Ngo,et al.  Video summarization and scene detection by graph modeling , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[67]  Shingo Uchihashi,et al.  Video Manga: generating semantically meaningful video summaries , 1999, MULTIMEDIA '99.

[68]  Shih-Fu Chang,et al.  Constrained utility maximization for generating visual skims , 2001, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL 2001).

[69]  Riccardo Leonardi,et al.  Extraction of Significant Video Summaries by Dendrogram Analysis , 2006, 2006 International Conference on Image Processing.

[70]  HongJiang Zhang,et al.  A model of motion attention for video skimming , 2002, Proceedings. International Conference on Image Processing.