论文信息 - A fast audio classification from MPEG coded data

A fast audio classification from MPEG coded data

Audio information classification becomes a very important task for such purposes as automatic keyword spotting and other content-based audio-visual query systems. In this paper, we describe a fast and accurate audio data classification method on the MPEG coded data domain. Firstly silent segments are detected using a robust approach for different recording conditions. Then the non-silent segments are classified into three types, music, speech, and applause using temporal density, bandwidth and center frequency of subband energy. In order to be robust for a variety of audio sources as much as possible, we use Bayes discriminant function for multivariate Gaussian distribution instead of manually adjusting a threshold for each discriminator. In the experiment, every one-second of MPEG audio data is classified and about 90% of audio and speech segments have been successfully detected. As for the detection speed, less than 20% of MPEG audio decoding processing power is required.

[1] Sing-Tze Bow,et al. Pattern recognition and image preprocessing , 1992 .

[2] Malcolm Slaney,et al. Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Douglas Keislar,et al. Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[4] Nilesh V. Patel,et al. Audio characterization for video indexing , 1996, Electronic Imaging.

[5] Masaru Sugano,et al. A fast scene change detection on MPEG coding parameter domain , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[6] John S. Boreczky,et al. A hidden Markov model framework for video segmentation using audio and image features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7] Erling H. Wold,et al. Content-Based Search, and Retrieval of Audio , 1996 .

[8] John Saunders,et al. Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9] Nilesh V. Patel,et al. Video classification using speaker identification , 1997, Electronic Imaging.

[10] Boon-Lock Yeo,et al. Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[11] Riccardo Leonardi,et al. Audio as a support to scene change detection and characterization of video sequences , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] Jeho Nam,et al. Combined audio and visual streams analysis for video sequence segmentation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] E. R. Davies. Pattern Recognition and Image Preprocessing , 1993 .

[14] Masaru Sugano,et al. MPEG audio bit rate scaling on coded data domain , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).