Audio Visual Cues for Video Indexing and Retrieval

This paper studies content-based video retrieval using the combination of audio and visual features. The visual feature is extracted by an adaptive video indexing technique that places a strong emphasis on accurate characterization of spatio-temporal information within video clips. Audio feature is extracted by a statistical time-frequency analysis method that applies Laplacian mixture models to wavelet coefficients. The proposed joint audio-visual retrieval framework is highly flexible and scalable, and can be effectively applied to various types of video databases.

[1]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[2]  Ling Guan,et al.  Video retrieval using an adaptive video indexing technique and automatic relevance feedback , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[3]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[4]  Caterina Saraceno Video content extraction and representation using a joint audio and video processing , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Wenjun Zeng,et al.  Integrated image and speech analysis for content-based video indexing , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[6]  Teuvo Kohonen,et al.  In: Self-organising Maps , 1995 .

[7]  Milind R. Naphade,et al.  Extracting semantics from audio-visual content: the final frontier in multimedia retrieval , 2002, IEEE Trans. Neural Networks.

[8]  Anil C. Kokaram,et al.  Joint audio visual retrieval for tennis broadcasts , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Robert D. Nowak,et al.  Wavelet-based statistical signal processing using hidden Markov models , 1998, IEEE Trans. Signal Process..

[10]  Zhu Liu,et al.  Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[11]  Erling H. Wold,et al.  Content-Based Search, and Retrieval of Audio , 1996 .

[12]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[14]  John Zimmerman,et al.  A probabilistic layered framework for integrating multimedia content and context information , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .