Evaluation of BIC-based algorithms for audio segmentation

Abstract The Bayesian Information Criterion (BIC) is a widely adopted method for audio segmentation, and has inspired a number of dominant algorithms for this application. At present, however, literature lacks in analytical and experimental studies on these algorithms. This paper tries to partially cover this gap. Typically, BIC is applied within a sliding variable-size analysis window where single changes in the nature of the audio are locally searched. Three different implementations of the algorithm are described and compared: (i) the first keeps updated a pair of sums, that of input vectors and that of square input vectors, in order to save computations in estimating covariance matrices on partially shared data; (ii) the second implementation, recently proposed in literature, is based on the encoding of the input signal with cumulative statistics for an efficient estimation of covariance matrices; (iii) the third implementation consists of a novel approach, and is characterized by the encoding of the input stream with the cumulative pair of sums of the first approach. Furthermore, a dynamic programming algorithm is presented that, within the BIC model, finds a globally optimal segmentation of the input audio stream. All algorithms are analyzed in detail from the viewpoint of the computational cost, experimentally evaluated on proper tasks, and compared.

[1]  Christian Wellekens,et al.  Audio data indexing: Use of second-order statistics for speaker-based segmentation , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[2]  Steve Young,et al.  Segment generation and clustering in the HTK broadcast news transcription system , 1998 .

[3]  Marcello Federico,et al.  Development and Evaluation of an Italian Broadcast News Corpus , 2000, LREC.

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  Mauro Cettolo,et al.  A DP algorithm for speaker change detection , 2003, INTERSPEECH.

[6]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Hirotugu Akaike,et al.  On entropy maximization principle , 1977 .

[8]  Ramesh A. Gopinath,et al.  Improved speaker segmentation and segments clustering using the bayesian information criterion , 1999, EUROSPEECH.

[9]  Lie Lu,et al.  Content-based audio segmentation using support vector machines , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[10]  Mauro Cettolo,et al.  MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION , 2001 .

[11]  Roger K. Moore Computer Speech and Language , 1986 .

[12]  Alexander H. Waibel,et al.  Strategies for automatic segmentation of audio data , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[14]  Jean-Luc Gauvain,et al.  Partitioning and transcription of broadcast news data , 1998, ICSLP.

[15]  J. Ware,et al.  Applications of Statistics , 1978 .

[16]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[17]  Reinhold Häb-Umbach,et al.  A study of broadcast news audio stream segmentation and segment clustering , 1999, EUROSPEECH.

[18]  Herbert Gish,et al.  Segregation of speakers for speech recognition and speaker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[19]  Mauro Cettolo Segmentation, classification and clustering of an Italian broadcast news corpus , 2000 .

[20]  Mauro Cettolo,et al.  Efficient audio segmentation algorithms based on the BIC , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[21]  Puming Zhan,et al.  Progress in Broadcast News transcription at Dragon Systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[22]  M. A. Siegler,et al.  Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[23]  Christian Wellekens Seamless navigation in audio files , 2001, Odyssey.

[24]  Aladdin M. Ariyaeeinia,et al.  On the use of the Bayesian information criterion in multiple speaker detection , 2001, INTERSPEECH.

[25]  Perrine Delacourt,et al.  Speaker-based segmentation for audio data indexing , 1999 .