Event based indexing of broadcasted sports video by intermodal collaboration

In this paper, we propose event-based video indexing, which is a kind of indexing by its semantical contents. Because video data is composed of multimodal information streams such as visual, auditory, and textual [closed caption (CC)] streams, we introduce a strategy of intermodal collaboration, i.e., collaborative processing taking account of the semantical dependency between these streams. Its aim is to improve the reliability and efficiency in contents analysis of video. Focusing here on temporal correspondence between visual and CC streams, the proposed method attempts to seek for time spans in which events are likely to take place through extraction of keywords from the CC stream and then to index shots in the visual stream. The experimental results for broadcasted sports video of American football games indicate that intermodal collaboration is effective for video indexing by the events such as touchdown (TD) and field goal (FG).

[1]  Ramesh C. Jain,et al.  Detecting events from continuous media by intermodal collaboration and knowledge use , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[2]  John S. Boreczky,et al.  Comparison of video shot boundary detection techniques , 1996, J. Electronic Imaging.

[3]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..

[4]  Stephen W. Smoliar,et al.  Content based video indexing and retrieval , 1994, IEEE MultiMedia.

[5]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[6]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Dragutin Petkovic,et al.  Content-Based Representation and Retrieval of Visual Media: A State-of-the-Art Review , 1996 .

[8]  Wenjun Zeng,et al.  Integrated image and speech analysis for content-based video indexing , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[9]  Aaron F. Bobick,et al.  A Framework for Recognizing Multi-Agent Action from Visual Evidence , 1999, AAAI/IAAI.

[10]  Mark T. Maybury,et al.  Broadcast news navigation using story segmentation , 1997, MULTIMEDIA '97.

[11]  Noboru Babaguchi,et al.  Extracting actors, actions and events from sports video -a fundamental approach to story tracking , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[12]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[13]  Takeo Kanade,et al.  Semantic analysis for video contents extraction—spotting by association in news video , 1997, MULTIMEDIA '97.

[14]  Wolfgang Effelsberg,et al.  Video abstracting , 1997, CACM.

[15]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[16]  Ramesh C. Jain,et al.  Knowledge-guided parsing in video databases , 1993, Electronic Imaging.

[17]  Ramesh C. Jain,et al.  Event detection from continuous media , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[18]  David C. Gibbon,et al.  Automated authoring of hypermedia documents of video programs , 1995, MULTIMEDIA '95.

[19]  Shingo Uchihashi,et al.  Video Manga: generating semantically meaningful video summaries , 1999, MULTIMEDIA '99.

[20]  Arun Hampapur Designing video data management systems , 1995 .

[21]  Svetha Venkatesh,et al.  Combining NL processing and video data to query American Football , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[22]  G. West,et al.  On the automated interpretation and indexing of American Football , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.