Recognizing high-level audio-visual concepts using context

The recognition of high-level semantics from audio-visual data is a challenging multimedia understanding problem. The difficulty mainly lies in the gap that exists between low level media features and high level semantic concepts. In an attempt to bridge this gap we proposed a probabilistic framework for semantic understanding. The components of this framework are probabilistic multimedia objects and a graphical network of such objects. We show how the framework supports detection of multiple high-level concepts, which enjoy spatial and temporal support. More importantly, we show why context matters and how it can be modeled. Using a factor graph framework, we model context and use it to improve the detection of sites, objects and events. Using the concepts outdoor and a flying-helicopter we demonstrate how the factor graph multinet models context. Using ROC curves and probability of error curves we support the intuition that context should help.

[1]  Milind R. Naphade,et al.  Semantic video indexing using a probabilistic framework , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[2]  Hiroshi Hamada,et al.  Video Handling with Music and Speech Detection , 1998, IEEE Multim..

[3]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[4]  A. Murat Tekalp,et al.  A high-performance shot boundary detection algorithm using multiple cues , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[5]  K. Ramchandran,et al.  A factor graph framework for semantic indexing and retrieval in video , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[6]  C.-C. Jay Kuo,et al.  Integrated approach to multimodal media content analysis , 1999, Electronic Imaging.

[7]  Nuno Vasconcelos,et al.  Bayesian modeling of video editing and structure: semantic features for video summarization and browsing , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[8]  M. Ibrahim Sezan,et al.  A computational approach to semantic event detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[9]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Alexander G. Hauptmann,et al.  Learning to Recognize Speech by Watching Television , 1999, IEEE Intell. Syst..