Context-enhanced video understanding

Many recent efforts have been made to automatically index multimedia content with the aim of bridging the semantic gap between syntax and semantics. In this paper, we propose a novel framework to automatically index video using context for video understanding. First we discuss the notion of context and how it relates to video understanding. Then we present the framework we are constructing, which is modeled as an expert system that uses a rule-based engine, domain knowledge, visual detectors (for objects and scenes), and different data sources available with the video (metadata, text from automatic speech recognition, etc.). We also describe our approach to align text from speech recognition and video segments, and present experiments using a simple implementation of our framework. Our experiments show that context can be used to improve the performance of visual detectors.

[1]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[2]  J. McDermott,et al.  Rule-Based Interpretation of Aerial Imagery , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Mark J. F. Gales,et al.  Mean and variance adaptation within the MLLR framework , 1996, Comput. Speech Lang..

[4]  Shih-Fu Chang,et al.  Concepts and Techniques for Indexing Visual Semantics , 2002 .

[5]  Shih-Fu Chang,et al.  Learning Structured Visual Detectors from User Input at Multiple Levels , 2001, Int. J. Image Graph..

[6]  Bipin Indurkhya,et al.  Modeling Context Effect in Perceptual Domains , 2001, CONTEXT.

[7]  John R. Smith,et al.  Video personalization system for Usage environment , 2002 .

[8]  Thomas S. Huang,et al.  A probablistic framework for mapping audio-visual features to high-level semantics in terms of concepts and context , 2001 .

[9]  Dragutin Petkovic,et al.  "What is in that Video Anyway?" In Search of Better Browsing , 1999, ICMCS, Vol. 1.

[10]  Shih-Fu Chang,et al.  Semantic knowledge construction from annotated image collections , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[11]  Cristina Bosco,et al.  Context and Multi-media Corpora , 2001, CONTEXT.

[12]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[13]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[14]  Shih-Fu Chang,et al.  IMKA: a multimedia organization system combining perceptual and semantic knowledge , 2001, MULTIMEDIA '01.

[15]  John Durkin,et al.  Expert systems - design and development , 1994 .

[16]  Mark J. F. Gales,et al.  Automatic transcription of Broadcast News , 2002, Speech Commun..