A Speech/Music/Silence/Garbage/ Classifier for Searching and Indexing Broadcast News Material

An audio classifier that can distinguish between speech, music, silence and garbage has been developed. The classifier was trained and tested on broadcast news material provided by VRT (Flemish Radio and Television Network). Several feature sets and machine learning algorithms have been tested, providing choices of speed and performance for a target system. The audio classifier is part of a greater system that together with visual data can retrieve information from news broadcasts: speech can be converted to text and the speaker can be recognized. Music can be further used for genre classification, jingle recognition or copyright infringement detection. Silence is recognized and used to provide cues on topic changes or speaker turns. At this point everything that is not classified as speech, music or silence is labeled garbage. Garbage classes can be further used for background categorization giving information on the environment where someone speaks (an anchor in the studio or a reporter in the street).

[1]  Hervé Bourlard,et al.  Robust HMM-based speech/music segmentation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Ed. McKenzie 10. Time Series Analysis by Higher Order Crossings , 1996 .

[4]  R.A. Goubran,et al.  Pitch-based feature extraction for audio classification , 2003, The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings..

[5]  Daniel P. W. Ellis,et al.  Speech/music discrimination based on posterior probability features , 1999, EUROSPEECH.

[6]  Lie Lu,et al.  A robust audio classification and segmentation method , 2001, MULTIMEDIA '01.

[7]  Georgios Tziritas,et al.  A speech/music discriminator based on RMS and zero-crossings , 2005, IEEE Transactions on Multimedia.

[8]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Adrian D. C. Chan,et al.  Adaptive Feature Selection for Speech / Music Classification , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.

[10]  Peter Kabal,et al.  Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).