论文信息 - Comparison of Two Speech/Music Segmentation Systems For Audio Indexing on the Web

Comparison of Two Speech/Music Segmentation Systems For Audio Indexing on the Web

This article talks about two majors ways of performing a speech/music segmentation task. The first one uses a competing modelling approach based on classical speech recognition parameters (MFCC). The second one uses a class/non-class approach for both main topics: speech/non-speech and music/non-music. In order to fit closely speech and music characteristics, different kinds of parameters are used, MFCC and spectral coefficients. We present both approaches with some intrinsic experiments. Then, we compare their speech/music discrimination accuracy using a real-world testing corpus: a broadcast program containing noisy interviews, superimposed segments (speech with music), and an alternation of broad-band speech and telephone speech. Within the classical approach, we can notice that either the derivative alone, or the second derivative alone, plays a major role in the discrimination process as well as the number of cepstral coefficients. In the differentiated way, the class/non-class approach is more homogeneous.

Dominique Fohr | Odile Mella | Christine Sénac | Joseph Razik | Nathalie Parlangeau-Vallès

[1] Mark Liberman,et al. Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[2] Julien Pinquier,et al. Robust speech / music classification in audio documents , 2002, INTERSPEECH.

[3] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4] Jean-Luc Gauvain,et al. Audio Partitioning and Transcription for Broadcast Data Indexation , 2004, Multimedia Tools and Applications.

[5] Malcolm Slaney,et al. Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Hervé Bourlard,et al. Robust HMM-based speech/music segmentation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] J. Rissanen. A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .