A probabilistic layered framework for integrating multimedia content and context information

Automatic indexing of large collections of multimedia data is important for enabling retrieval functions. Current approaches mostly draw on a single or dual modality of video content analysis. Here we describe a framework for the integration of multimedia content and context information, which generalizes and systematizes current methods. Content information in the visual, audio, and text domains, is described at different levels of granularity and abstraction. Context describes the underlying structural information that can be used to constrain the possible number of interpretations. We introduce a probabilistic framework that combines (a) Bayesian networks that describe both content and context and (b) hierarchical priors that describe the integration of content and context. We present an application that uses this framework to segment and index TV programs. We discuss experimental results on segment classification on six and a half hours of broadcast video. In our experiments we used audio context information. Classification results for financial segments yield 83% and for celebrity segments 89%.

[1]  R.S. Jasinschi,et al.  Automatic TV program genre classification based on audio patterns , 2001, Proceedings 27th EUROMICRO Conference. 2001: A Net Odyssey.

[2]  John Zimmerman,et al.  Personalizing video recorders using multimedia processing and integration , 2001, MULTIMEDIA '01.

[3]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[5]  Bruno O. Shubert,et al.  Random variables and stochastic processes , 1979 .

[6]  Tsuhan Chen,et al.  Audio-visual integration in multimodal communication , 1998, Proc. IEEE.

[7]  Milind R. Naphade,et al.  A probabilistic framework for semantic indexing and retrieval in video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[8]  John Zimmerman,et al.  Video scouting: an architecture and system for the integration of multimedia information in personal TV applications , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  Nuno Vasconcelos,et al.  Bayesian representations and learning mechanisms for content-based image retrieval , 1999, Electronic Imaging.

[10]  Maria Huhtala,et al.  Random Variables and Stochastic Processes , 2021, Matrix and Tensor Decompositions in Signal Processing.

[11]  John Zimmerman,et al.  Integrated multimedia processing for topic segmentation and classification , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[12]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..