A probabilistic framework for semantic video indexing, filtering, and retrieval

Semantic filtering and retrieval of multimedia content is crucial for efficient use of the multimedia data repositories. Video query by semantic keywords is one of the most difficult problems in multimedia data retrieval. The difficulty lies in the mapping between low-level video representation and high-level semantics. We therefore formulate the multimedia content access problem as a multimedia pattern recognition problem. We propose a probabilistic framework for semantic video indexing, which call support filtering and retrieval and facilitate efficient content-based access. To map low-level features to high-level semantics we propose probabilistic multimedia objects (multijects). Examples of multijects in movies include explosion, mountain, beach, outdoor, music etc. Semantic concepts in videos interact and to model this interaction explicitly, we propose a network of multijects (multinet). Using probabilistic models for six site multijects, rocks, sky, snow, water-body forestry/greenery and outdoor and using a Bayesian belief network as the multinet we demonstrate the application of this framework to semantic indexing. We demonstrate how detection performance can be significantly improved using the multinet to take interconceptual relationships into account. We also show how the multinet can fuse heterogeneous features to support detection based on inference and reasoning.

[1]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  H. Vincent Poor,et al.  An introduction to signal detection and estimation (2nd ed.) , 1994 .

[4]  Yihong Gong,et al.  Video parsing using compressed data , 1994, Electronic Imaging.

[5]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[6]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[7]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[8]  Shih-Fu Chang,et al.  Scene change detection in an MPEG-compressed video sequence , 1995, Electronic Imaging.

[9]  Rangachar Kasturi,et al.  Machine vision , 1995 .

[10]  Yücel Altunbasak,et al.  Content-based video retrieval and compression: a unified solution , 1997, Proceedings of International Conference on Image Processing.

[11]  Wayne H. Wolf,et al.  Hidden Markov model parsing of video programs , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Nilesh V. Patel,et al.  Video Segmentation for Video Data Management , 1997, Handbook of Multimedia Information Management.

[13]  Jeho Nam,et al.  Speaker identification and video analysis for hierarchical video shot classification , 1997, Proceedings of International Conference on Image Processing.

[14]  B. S. Manjunath,et al.  Content-based search of video using color, texture, and motion , 1997, Proceedings of International Conference on Image Processing.

[15]  Shih-Fu Chang,et al.  Spatio-temporal video search using the object based video representation , 1997, Proceedings of International Conference on Image Processing.

[16]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[17]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[18]  A. Murat Tekalp,et al.  A high-performance shot boundary detection algorithm using multiple cues , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[19]  Hiroshi Hamada,et al.  Video Handling with Music and Speech Detection , 1998, IEEE Multim..

[20]  Wei Xiong,et al.  Query by video clip , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[21]  Shih-Fu Chang,et al.  Semantic visual templates: linking visual features to semantics , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[22]  Anil K. Jain,et al.  Shape-Based Retrieval: A Case Study With Trademark Image Databases , 1998, Pattern Recognit..

[23]  Milind R. Naphade,et al.  Novel scheme for fast and efficent video sequence matching using compact signatures , 1999, Electronic Imaging.

[24]  C.-C. Jay Kuo,et al.  Integrated approach to multimodal media content analysis , 1999, Electronic Imaging.

[25]  M. Ibrahim Sezan,et al.  A computational approach to semantic event detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[26]  Milind R. Naphade,et al.  Stochastic modeling of soundtrack for efficient segmentation and indexing of video , 1999, Electronic Imaging.

[27]  A. Murat Tekalp,et al.  Probabilistic Analysis and Extraction of Video Content , 1999, ICIP.

[28]  Milind R. Naphade,et al.  Multimodal pattern matching for audio-visual query and retrieval , 2001, IS&T/SPIE Electronic Imaging.