Integrated approach to multimodal media content analysis

In this work, we present a system for the automatic segmentation, indexing and retrieval of audiovisual data based on the combination of audio, visual and textural content analysis. The video stream is demultiplexed into audio, image and caption components. Then, a semantic segmentation of the audio signal based on audio content analysis is conducted, and each segment is indexed as one of the basic audio types. The image sequence is segmented into shots based on visual information analysis, and keyframes are extracted from each shot. Meanwhile, keywords are detected from the closed caption. Index tables are designed for both linear and non-linear access to the video. It is shown by experiments that the proposed methods for multimodal media content analysis are effective. And that the integrated framework achieves satisfactory results for video information filtering and retrieval.

[1]  C.-C. Jay Kuo,et al.  Heuristic approach for generic audio data segmentation and annotation , 1999, MULTIMEDIA '99.

[2]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[3]  Shih-Fu Chang,et al.  Clustering methods for video browsing and annotation , 1996, Electronic Imaging.

[4]  Akio Nagasaka,et al.  Automatic Video Indexing and Full-Video Search for Object Appearances , 1991, VDB.

[5]  C.-C. Jay Kuo,et al.  Video content parsing based on combined audio and visual information , 1999, Optics East.

[6]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[7]  C.-C. Jay Kuo,et al.  Classification and retrieval of sound effects in audiovisual data management , 1999, Conference Record of the Thirty-Third Asilomar Conference on Signals, Systems, and Computers (Cat. No.CH37020).

[8]  John S. Boreczky,et al.  A hidden Markov model framework for video segmentation using audio and image features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Thomas D. C. Little,et al.  A Survey of Technologies for Parsing and Indexing Digital Video1 , 1996, J. Vis. Commun. Image Represent..