A review on multimodal video indexing

Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. Efficient, single modality based, video indexing methods have appeared in literature. Effective indexing, however, requires a multimodal approach in which either the most appropriate modality is selected or the different modalities are used in collaborative fashion. We present a framework for multimodal video indexing, which views a video document from the perspective of its author. The framework serves as a blueprint for a generic and flexible multimodal video indexing system, and generalizes different state-of-the-art video indexing methods. It furthermore forms the basis for categorizing these different methods.

[1]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[2]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[3]  Rainer Lienhart,et al.  Scene Determination Based on Video and Audio Features , 2004, Multimedia Tools and Applications.

[4]  Noboru Babaguchi,et al.  Event based indexing of broadcasted sports video by intermodal collaboration , 2002, IEEE Trans. Multim..

[5]  Joseph M. Boggs The Art of Watching Films , 1978 .

[6]  Joëlle Coutaz,et al.  A design space for multimodal systems: concurrent processing and data fusion , 1993, INTERCHI.

[7]  Ali N. Akansu,et al.  Multi-Modal Dialog Scene Detection Using Hidden Markov Models for Content-Based Multimedia Indexing , 2001, Multimedia Tools and Applications.

[8]  Wolfgang Effelsberg,et al.  Automatic recognition of film genres , 1995, MULTIMEDIA '95.

[9]  R. Brunelli,et al.  A Survey on the Automatic Indexing of Video Data, , 1999, J. Vis. Commun. Image Represent..

[10]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[11]  Ishwar K. Sethi,et al.  Classification of general audio data for content-based retrieval , 2001, Pattern Recognit. Lett..

[12]  Alexander G. Hauptmann,et al.  Learning to Recognize Speech by Watching Television , 1999, IEEE Intell. Syst..

[13]  Zhu Liu,et al.  Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[14]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..