Speech-music discrimination from MPEG-1 bitstream

This paper describes a proposed algorithm for speech/music discrimination, which works on data directly taken from MPEG encoded bitstream thus avoiding the computationally difficult decoding-encoding process. The method is based on thresholding of features derived from the modulation envelope of the frequency-limited audio signal. The discriminator is tested on more than 2 hours of audio data, which contain clean and noisy speech from several speakers and a variety of music content. The discriminator is able to work in real time and despite its simplicity, results are very promising.

[1]  Alan F. Smeaton,et al.  Fischlar: an on-line system for indexing and browsing broadcast television content , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  C.-C. Jay Kuo,et al.  Content-based classification and retrieval of audio , 1998, Optics & Photonics.

[3]  Gerhard Stoll,et al.  ISO-MPEG-1 Audio: A Generic Standard for Coding of High-: Quality Digital Audio , 1994 .

[4]  C.-C. Jay Kuo,et al.  Hierarchical classification of audio data for archiving and retrieving , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Ronaldus Maria Aarts,et al.  A real-time speech-music discriminator , 1999 .

[7]  Nilesh V. Patel,et al.  Audio characterization for video indexing , 1996, Electronic Imaging.

[8]  Tsuhan Chen,et al.  Audio feature extraction and analysis for scene classification , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[9]  Noel Murphy,et al.  Automatic TV advertisement detection from MPEG bitstream , 2002, Pattern Recognit..

[10]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.