论文信息 - Speech-music discrimination from MPEG-1 bitstream

Speech-music discrimination from MPEG-1 bitstream

This paper describes a proposed algorithm for speech/music discrimination, which works on data directly taken from MPEG encoded bitstream thus avoiding the computationally difficult decoding-encoding process. The method is based on thresholding of features derived from the modulation envelope of the frequency-limited audio signal. The discriminator is tested on more than 2 hours of audio data, which contain clean and noisy speech from several speakers and a variety of music content. The discriminator is able to work in real time and despite its simplicity, results are very promising.

Noel E. O'Connor | Noel Murphy | Seán Marlow | Roman Jarina

[1] Alan F. Smeaton,et al. Fischlar: an on-line system for indexing and browsing broadcast television content , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2] C.-C. Jay Kuo,et al. Content-based classification and retrieval of audio , 1998, Optics & Photonics.

[3] Gerhard Stoll,et al. ISO-MPEG-1 Audio: A Generic Standard for Coding of High-: Quality Digital Audio , 1994 .

[4] C.-C. Jay Kuo,et al. Hierarchical classification of audio data for archiving and retrieving , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5] Malcolm Slaney,et al. Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Ronaldus Maria Aarts,et al. A real-time speech-music discriminator , 1999 .

[7] Nilesh V. Patel,et al. Audio characterization for video indexing , 1996, Electronic Imaging.

[8] Tsuhan Chen,et al. Audio feature extraction and analysis for scene classification , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[9] Noel Murphy,et al. Automatic TV advertisement detection from MPEG bitstream , 2002, Pattern Recognit..

[10] John Saunders,et al. Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.