A factor graph framework for semantic indexing and retrieval in video

This paper proposes a novel framework for semantic indexing and retrieval in digital video. The components of the framework are probabilistic multimedia objects (multijects) and a network of such objects (multinets). The main contribution of this paper is a novel application of a factor graph framework to model the interactions in a network of multijects (multinet) at a semantic level. Factor graphs are statistical graphical models that provide an efficient framework for exact and approximate inference via the sum-product algorithm. Incorporating the statistical interactions between the concepts using factor graphs enhances the detection probability of individual multijects and provides a unified framework for integrating multiple modalities and supports inference of unobservable concepts based on their relation with observable concepts. Our experiments reveal significant performance improvement using the inference on the factor graph models.

[1]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[2]  Shih-Fu Chang,et al.  Spatio-temporal video search using the object based video representation , 1997, Proceedings of International Conference on Image Processing.

[3]  Yücel Altunbasak,et al.  Content-based video retrieval and compression: a unified solution , 1997, Proceedings of International Conference on Image Processing.

[4]  A. Murat Tekalp,et al.  A high-performance shot boundary detection algorithm using multiple cues , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  Shih-Fu Chang,et al.  Semantic visual templates: linking visual features to semantics , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[7]  B. S. Manjunath,et al.  Content-based search of video using color, texture, and motion , 1997, Proceedings of International Conference on Image Processing.

[8]  Rangachar Kasturi,et al.  Machine vision , 1995 .

[9]  Milind R. Naphade,et al.  Stochastic modeling of soundtrack for efficient segmentation and indexing of video , 1999, Electronic Imaging.

[10]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[11]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[12]  Robert B. McGhee,et al.  Aircraft Identification by Moment Invariants , 1977, IEEE Transactions on Computers.

[13]  Anil K. Jain,et al.  Shape-Based Retrieval: A Case Study With Trademark Image Databases , 1998, Pattern Recognit..