Highlight detection and indexing in broadcast sports video by collaborative processing of text, audio, and image

In this paper we propose a highlight detection method and an indexing method for broadcast sports video using the collaborative processing of text, audio, and image. In the proposed method, the appearance pattern of words in the closed caption text stream is analyzed, and candidate intervals for highlights are detected. Next, these intervals are checked based on their audio levels, and those which seem to be erroneous are rejected. Finally, the resulting highlight intervals are segmented into shots, and the shots are indexed by identifying the highlight shots based on audio levels and dominant color information. In the results of using this method on a real football broadcast, highlight intervals were effectively detected with a recall rate of 77% and a precision rate of 84%. Moreover, for the intervals in which highlights were correctly detected, shot indexing was performed accurately 75% for only the first candidate, and 97% for up to the second candidate. We verified experimentally that efficient processing can be achieved by stepwise analysis of the text, audio, and image. © 2003 Wiley Periodicals, Inc. Syst Comp Jpn, 34(12): 22–31, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.10493

[1]  Wenjun Zeng,et al.  Integrated image and speech analysis for content-based video indexing , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[2]  Aaron F. Bobick,et al.  Recognizing Planned, Multiperson Action , 2001, Comput. Vis. Image Underst..

[3]  Stephen W. Smoliar,et al.  Content based video indexing and retrieval , 1994, IEEE MultiMedia.

[4]  C.-C. Jay Kuo,et al.  Heuristic approach for generic audio data segmentation and annotation , 1999, MULTIMEDIA '99.

[5]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[6]  Lie Lu,et al.  A robust audio classification and segmentation method , 2001, MULTIMEDIA '01.

[7]  Noboru Babaguchi,et al.  Event based indexing of broadcasted sports video by intermodal collaboration , 2002, IEEE Trans. Multim..

[8]  Ramesh C. Jain,et al.  Event detection from continuous media , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[9]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Shih-Fu Chang,et al.  Overview of the MPEG-7 standard , 2001, IEEE Trans. Circuits Syst. Video Technol..