A Hybrid Framework for Detecting the Semantics of Concepts and Context

Semantic understanding of multimedia content necessitates models for the semantics of concepts, context and structure. We propose a hybrid framework that can combine discriminant or generative models for concepts with generative models for structure and context. Using the TREC Video 2002 benchmark corpus we show that robust models can be built for several diverse visual semantic concepts. We use a novel factor graphical framework to model inter-conceptual context for 12 semantic concepts of the corpus. Using the sum-product algorithm [1] for approximate or exact inference in these factor graph multinets, we attempt to correct errors made during isolated concept detection by forcing high-level constraints. This results in a significant improvement in the overall detection performance. Enforcement of this probabilistic context model enhances the detection performance further to 22% using the global multinet, whereas its factored approximation also leads to improvement by 18% over the baseline concept detection. This improvement is achieved without using any additional training data or separate annotations.

[1]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[2]  Haim H. Permuter,et al.  IBM Research TREC 2002 Video Retrieval System , 2002, TREC.

[3]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[4]  Milind R. Naphade,et al.  Classifying motion picture soundtrack for video indexing , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[5]  Dragutin Petkovic,et al.  "What is in that Video Anyway?" In Search of Better Browsing , 1999, ICMCS, Vol. 1.

[6]  John R. Smith,et al.  Role of classifiers in multimedia content management , 2003, IS&T/SPIE Electronic Imaging.

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  John R. Smith,et al.  A framework for moderate vocabulary semantic visual concept detection , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[9]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..