Audio-Visual Event Recognition with Application in Sports Video

We summarize our recent work on “highlight” events detection and recognition in sports video. We have developed two different joint audio-visual fusion frameworks for this task, namely “audio-visual coupled hidden Markov model” and “audio classification then visual hidden Markov model verification”. Our comparative study of these two frameworks shows that the second approach outperforms the first approach by a large margin. Our study also suggests the importance of modeling the so-called middle-level features such as audience reactions and camera patterns in sports video.

[1]  HongJiang Zhang,et al.  Automatic parsing of TV soccer programs , 1995, Proceedings of the International Conference on Multimedia Computing and Systems.

[2]  A. Murat Tekalp,et al.  Automatic Soccer Video Analysis and Summarization , 2003, IS&T/SPIE Electronic Imaging.

[3]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[4]  Regunathan Radhakrishnan,et al.  Effective and efficient sports highlights extraction using the minimum description length criterion in selecting GMM structures , 2004, ICME.

[5]  Shih-Fu Chang,et al.  Algorithms and system for segmentation and structure analysis in soccer video , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[6]  Shih-Fu Chang,et al.  Structure analysis of soccer video with hidden Markov models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Ziyou Xiong,et al.  Audio-visual sports highlights extraction using Coupled Hidden Markov Models , 2005, Pattern Analysis and Applications.

[8]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[9]  Yoshinao Aoki,et al.  Indexing of baseball telecast for content-based video retrieval , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[10]  Kevin P. Murphy,et al.  A coupled HMM for audio-visual speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Regunathan Radhakrishnan,et al.  Video Summarization Using Mpeg-7 Motion Activity and Audio Descriptors , 2003 .

[13]  Ajay Divakaran,et al.  Rapid generation of sports video highlights using the MPEG-7 motion activity descriptor , 2001, IS&T/SPIE Electronic Imaging.

[14]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.