Highlights extraction from sports video based on an audio-visual marker detection framework

We propose to use a visual object (e.g., the baseball catcher) detection algorithm to find local, semantic objects in video frames in addition to an audio classification algorithm to find semantic audio objects in the audio track for sports highlights extraction. The highlight candidates are then further grouped into finer-resolution highlight segments, using color or motion information. During the grouping phase, many of the false alarms can be correctly identified and eliminated. Our experimental results with baseball, soccer and golf video are promising.

[1]  Wenjun Zeng,et al.  Integrated image and speech analysis for content-based video indexing , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[2]  Zhu Liu,et al.  Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[3]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[4]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[5]  Shih-Fu Chang,et al.  Structure analysis of soccer video with hidden Markov models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Qi Tian,et al.  A mid-level representation framework for semantic sports video analysis , 2003, ACM Multimedia.

[7]  Alan Hanjalic,et al.  Generic approach to highlights extraction from a sport video , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[8]  Regunathan Radhakrishnan,et al.  Effective and efficient sports highlights extraction using the minimum description length criterion in selecting GMM structures , 2004, ICME.

[9]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.