A fast audio classification from MPEG coded data

Audio information classification becomes a very important task for such purposes as automatic keyword spotting and other content-based audio-visual query systems. In this paper, we describe a fast and accurate audio data classification method on the MPEG coded data domain. Firstly silent segments are detected using a robust approach for different recording conditions. Then the non-silent segments are classified into three types, music, speech, and applause using temporal density, bandwidth and center frequency of subband energy. In order to be robust for a variety of audio sources as much as possible, we use Bayes discriminant function for multivariate Gaussian distribution instead of manually adjusting a threshold for each discriminator. In the experiment, every one-second of MPEG audio data is classified and about 90% of audio and speech segments have been successfully detected. As for the detection speed, less than 20% of MPEG audio decoding processing power is required.

[1]  Sing-Tze Bow,et al.  Pattern recognition and image preprocessing , 1992 .

[2]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[4]  Nilesh V. Patel,et al.  Audio characterization for video indexing , 1996, Electronic Imaging.

[5]  Masaru Sugano,et al.  A fast scene change detection on MPEG coding parameter domain , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[6]  John S. Boreczky,et al.  A hidden Markov model framework for video segmentation using audio and image features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Erling H. Wold,et al.  Content-Based Search, and Retrieval of Audio , 1996 .

[8]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Nilesh V. Patel,et al.  Video classification using speaker identification , 1997, Electronic Imaging.

[10]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[11]  Riccardo Leonardi,et al.  Audio as a support to scene change detection and characterization of video sequences , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Jeho Nam,et al.  Combined audio and visual streams analysis for video sequence segmentation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  E. R. Davies Pattern Recognition and Image Preprocessing , 1993 .

[14]  Masaru Sugano,et al.  MPEG audio bit rate scaling on coded data domain , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).