Factor graph framework for semantic video indexing

Video query by semantic keywords is one of the most challenging research issues in video data management. To go beyond low-level similarity and access video data content by semantics, we need to bridge the gap between the low-level representation and high-level semantics. This is a difficult multimedia understanding problem. We formulate this problem as a probabilistic pattern-recognition problem for modeling semantics in terms of concepts and context. To map low-level features to high-level semantics, we propose probabilistic multimedia objects (multijects). Examples of multijects in movies include explosion, mountain, beach, outdoor, music, etc. Semantic concepts in videos interact and appear in context. To model this interaction explicitly, we propose a network of multijects (multinet). To model the multinet computationally, we propose a factor graph framework which can enforce spatio-temporal constraints. Using probabilistic models for multijects, rocks, sky, snow, water-body, and forestry/greenery, and using a factor graph as the multinet, we demonstrate the application of this framework to semantic video indexing. We demonstrate how detection performance can be significantly improved using the multinet to take inter-conceptual relationships into account. Our experiments using a large video database consisting of clips from several movies and based on a set of five semantic concepts reveal a significant improvement in detection performance by over 22%. We also show how the multinet is extended to take temporal correlation into account. By constructing a dynamic multinet, we show that the detection performance is further enhanced by as much as 12%. With this framework, we show how keyword-based query and semantic filtering is possible for a predetermined set of concepts.

[1]  Nuno Vasconcelos,et al.  Bayesian modeling of video editing and structure: semantic features for video summarization and browsing , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[2]  Anil K. Jain,et al.  Shape-Based Retrieval: A Case Study With Trademark Image Databases , 1998, Pattern Recognit..

[3]  Milind R. Naphade,et al.  Novel scheme for fast and efficent video sequence matching using compact signatures , 1999, Electronic Imaging.

[4]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[5]  Milind R. Naphade,et al.  Stochastic modeling of soundtrack for efficient segmentation and indexing of video , 1999, Electronic Imaging.

[6]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[7]  A. Murat Tekalp,et al.  Probabilistic Analysis and Extraction of Video Content , 1999, ICIP.

[8]  Shih-Fu Chang,et al.  Semantic visual templates: linking visual features to semantics , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[9]  Wayne H. Wolf,et al.  Hidden Markov model parsing of video programs , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[11]  M. Ibrahim Sezan,et al.  A computational approach to semantic event detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[12]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[13]  Shih-Fu Chang,et al.  Spatio-temporal video search using the object based video representation , 1997, Proceedings of International Conference on Image Processing.

[14]  A. Murat Tekalp,et al.  A high-performance shot boundary detection algorithm using multiple cues , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[15]  Niclas Wiberg,et al.  Codes and Decoding on General Graphs , 1996 .

[16]  Yihong Gong,et al.  Video parsing using compressed data , 1994, Electronic Imaging.

[17]  Thomas S. Huang,et al.  Image classification using a set of labeled and unlabeled images , 2000, SPIE Optics East.

[18]  Shih-Fu Chang,et al.  Scene change detection in an MPEG-compressed video sequence , 1995, Electronic Imaging.

[19]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[20]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[21]  K. Ramchandran,et al.  A factor graph framework for semantic indexing and retrieval in video , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[22]  B. S. Manjunath,et al.  Content-based search of video using color, texture, and motion , 1997, Proceedings of International Conference on Image Processing.

[23]  Brendan J. Frey,et al.  Iterative Decoding of Compound Codes by Probability Propagation in Graphical Models , 1998, IEEE J. Sel. Areas Commun..

[24]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[25]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[26]  John Cocke,et al.  Optimal decoding of linear codes for minimizing symbol error rate (Corresp.) , 1974, IEEE Trans. Inf. Theory.

[27]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[28]  Robert B. McGhee,et al.  Aircraft Identification by Moment Invariants , 1977, IEEE Transactions on Computers.

[29]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[30]  Milind R. Naphade,et al.  Semantic filtering of video content , 2001, IS&T/SPIE Electronic Imaging.

[31]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[32]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[33]  H. Vincent Poor,et al.  An introduction to signal detection and estimation (2nd ed.) , 1994 .

[34]  Wei Xiong,et al.  Query by video clip , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[35]  Brendan J. Frey,et al.  Probability Propagation and Iterative Decoding , 1996 .

[36]  Yücel Altunbasak,et al.  Content-based video retrieval and compression: a unified solution , 1997, Proceedings of International Conference on Image Processing.

[37]  C.-C. Jay Kuo,et al.  Integrated approach to multimodal media content analysis , 1999, Electronic Imaging.

[38]  Rangachar Kasturi,et al.  Machine vision , 1995 .

[39]  Charles E. McCulloch,et al.  The EM Algorithm and Its Extensions , 1998 .