论文信息 - Audio Visual Cues for Video Indexing and Retrieval

Audio Visual Cues for Video Indexing and Retrieval

This paper studies content-based video retrieval using the combination of audio and visual features. The visual feature is extracted by an adaptive video indexing technique that places a strong emphasis on accurate characterization of spatio-temporal information within video clips. Audio feature is extracted by a statistical time-frequency analysis method that applies Laplacian mixture models to wavelet coefficients. The proposed joint audio-visual retrieval framework is highly flexible and scalable, and can be effectively applied to various types of video databases.

[1] Thomas S. Huang,et al. Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[2] Ling Guan,et al. Video retrieval using an adaptive video indexing technique and automatic relevance feedback , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[3] Douglas Keislar,et al. Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[4] Caterina Saraceno. Video content extraction and representation using a joint audio and video processing , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5] Wenjun Zeng,et al. Integrated image and speech analysis for content-based video indexing , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[6] Teuvo Kohonen,et al. In: Self-organising Maps , 1995 .

[7] Milind R. Naphade,et al. Extracting semantics from audio-visual content: the final frontier in multimedia retrieval , 2002, IEEE Trans. Neural Networks.

[8] Anil C. Kokaram,et al. Joint audio visual retrieval for tennis broadcasts , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9] Robert D. Nowak,et al. Wavelet-based statistical signal processing using hidden Markov models , 1998, IEEE Trans. Signal Process..

[10] Zhu Liu,et al. Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[11] Erling H. Wold,et al. Content-Based Search, and Retrieval of Audio , 1996 .

[12] John Saunders,et al. Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13] Jeff A. Bilmes,et al. A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[14] John Zimmerman,et al. A probabilistic layered framework for integrating multimedia content and context information , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15] Teuvo Kohonen,et al. Self-Organizing Maps , 2010 .