A framework for flexible summarization of racquet sports video using multiple modalities

While most existing sports video research focuses on detecting event from soccer and baseball etc., little work has been contributed to flexible content summarization on racquet sports video, e.g. tennis, table tennis etc. By taking advantages of the periodicity of video shot content and audio keywords in the racquet sports video, we propose a novel flexible video content summarization framework. Our approach combines the structure event detection method with the highlight ranking algorithm. Firstly, unsupervised shot clustering and supervised audio classification are performed to obtain the visual and audio mid-level patterns respectively. Then, a temporal voting scheme for structure event detection is proposed by utilizing the correspondence between audio and video content. Finally, by using the affective features extracted from the detected events, a linear highlight model is adopted to rank the detected events in terms of their exciting degrees. Experimental results show that the proposed approach is effective.

[1]  Mei Han,et al.  Maximum entropy model-based baseball highlight detection and classification , 2004, Comput. Vis. Image Underst..

[2]  Qi Tian,et al.  A fusion scheme of visual and auditory modalities for event detection in sports video , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Yap-Peng Tan,et al.  Unsupervised clustering of dominant scenes in sports video , 2003, Pattern Recognit. Lett..

[4]  Tao Mei,et al.  Sports Video Mining with Mosaic , 2005, 11th International Multimedia Modelling Conference.

[5]  Weibei Dou,et al.  Content-based Table Tennis Games Highlight Detection Utilizing Audiovisual Clues , 2007, Fourth International Conference on Image and Graphics (ICIG 2007).

[6]  Milan Petkovic,et al.  Multi-modal extraction of highlights from TV Formula 1 programs , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[7]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Ajay Divakaran,et al.  Framework for measurement of the intensity of motion activity of video segments , 2004, J. Vis. Commun. Image Represent..

[9]  Lie Lu,et al.  Digital Object Identifier (DOI) 10.1007/s00530-002-0065-0 Multimedia Systems , 2003 .

[10]  Ajay Divakaran,et al.  Rapid generation of sports video highlights using the MPEG-7 motion activity descriptor , 2001, IS&T/SPIE Electronic Imaging.

[11]  Yu-Jin Zhang,et al.  Tracking Ball and Players with Applications to Highlight Ranking of Broadcasting Table Tennis Video , 2006, The Proceedings of the Multiconference on "Computational Engineering in Systems Applications".

[12]  Shih-Fu Chang,et al.  Structure analysis of soccer video with hidden Markov models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Regunathan Radhakrishnan,et al.  Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[14]  Alan Hanjalic,et al.  Generic approach to highlights extraction from a sport video , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[15]  Zhu Liu,et al.  Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[16]  Wen Gao,et al.  Unsupervised sports video scene clustering and its applications to story units detection , 2005, Visual Communications and Image Processing.

[17]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[18]  A. Murat Tekalp,et al.  Automatic Soccer Video Analysis and Summarization , 2003, IS&T/SPIE Electronic Imaging.

[19]  Qi Tian,et al.  A unified framework for semantic shot classification in sports video , 2005, IEEE Trans. Multim..

[20]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[22]  Bin Zhang,et al.  Audio Content-based Highlight Detection Using Adaptive Hidden Markov Model , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[23]  Regunathan Radhakrishnan,et al.  Generation of sports highlights using motion activity in combination with a common audio feature extraction framework , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[24]  Wen Gao,et al.  Human Behavior Analysis for Highlight Ranking in Broadcast Racket Sports Video , 2007, IEEE Transactions on Multimedia.

[25]  HongJiang Zhang,et al.  Automatic parsing of TV soccer programs , 1995, Proceedings of the International Conference on Multimedia Computing and Systems.

[26]  Surya Nepal,et al.  Automatic detection of 'Goal' segments in basketball videos , 2001, MULTIMEDIA '01.

[27]  Alan Hanjalic,et al.  Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[28]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[29]  Riccardo Leonardi,et al.  Semantic indexing of soccer audio-visual sequences: a multimodal approach based on controlled Markov chains , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .