Content-Based TV Sports Video Retrieval Based on Audio-Visual Features and Text Information

In this paper, we propose content-based video retrieval, which is a kind of retrieval by its semantical contents. Because video data is composed of multimodal information streams such as visual, auditory and textual streams, we describe a strategy of using multimodal analysis for automatic parsing sports video. The paper first defines the basic structure of sports video database system, and then introduces a new approach that integrates visual streams analysis, speech recognition, speech signal processing and text extraction to realize video retrieval. The experimental results for TV sports video of football games indicate that multimodal analysis is effective for video retrieval by quickly browsing tree-like video clips or inputting keywords within predefined domain.

[1]  HongJiang Zhang,et al.  Automatic parsing of TV soccer programs , 1995, Proceedings of the International Conference on Multimedia Computing and Systems.

[2]  Michael G. Christel Visual digests for news video libraries , 1999, MULTIMEDIA '99.

[3]  Huayong Liu,et al.  Content-based news video story segmentation and video retrieval , 2002, Other Conferences.

[4]  Nilesh V. Patel,et al.  Audio characterization for video indexing , 1996, Electronic Imaging.

[5]  John Chung-Mong Lee,et al.  Video Annotation by Motion Interpretation Using Optical Flow Streams , 1996, J. Vis. Commun. Image Represent..

[6]  Yoshiaki Shirai,et al.  Tracking players and a ball in soccer games , 1999, Proceedings. 1999 IEEE/SICE/RSJ. International Conference on Multisensor Fusion and Integration for Intelligent Systems. MFI'99 (Cat. No.99TH8480).

[7]  Jing Xiao,et al.  Content-Based Video Indexing and Retrieval , 2004 .