BIC-Based Speaker Segmentation Using Divide-and-Conquer Strategies With Application to Speaker Diarization

In this paper, we propose three divide-and-conquer approaches for Bayesian information criterion (BlC)-based speaker segmentation. The approaches detect speaker changes by recursively partitioning a large analysis window into two sub-windows and recursively verifying the merging of two adjacent audio segments using DeltaBIC, a widely-adopted distance measure of two audio segments. We compare our approaches to three popular distance-based approaches, namely, Chen and Gopalakrishnan's window-growing-based approach, Siegler et al.'s fixed-size sliding window approach, and Delacourt and Wellekens's DISTBIC approach, by performing computational cost analysis and conducting speaker change detection experiments on two broadcast news data sets. The results show that the proposed approaches are more efficient and achieve higher segmentation accuracy than the compared distance-based approaches. In addition, we apply the segmentation approaches discussed in this paper to the speaker diarization task. The experiment results show that a more effective segmentation approach leads to better diarization accuracy.

[1]  Christian Wellekens,et al.  A speaker tracking system based on speaker turn detection for NIST evaluation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Jean-François Bonastre,et al.  Step-by-step and integrated approaches in broadcast news speaker diarization , 2006, Comput. Speech Lang..

[3]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[4]  Jean-Luc Gauvain,et al.  Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  M. A. Siegler,et al.  Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[6]  Jeih-Weih Hung,et al.  Automatic metric-based speech segmentation for broadcast news via principal component analysis , 2000, INTERSPEECH.

[7]  John H. L. Hansen,et al.  Efficient audio stream segmentation via the combined T/sup 2/ statistic and Bayesian information criterion , 2005, IEEE Transactions on Speech and Audio Processing.

[8]  Herbert Gish,et al.  Segregation of speakers for speech recognition and speaker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Christian A. Müller,et al.  A fast-match approach for robust, faster than real-time speaker diarization , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[10]  Jhing-Fa Wang,et al.  Unsupervised speaker change detection using SVM training misclassification rate , 2007, IEEE Transactions on Computers.

[11]  Ramesh A. Gopinath,et al.  Transcription Of Broadcast News Shows With The Ibm Large Vocabulary Speech Recognition System , 1997 .

[12]  Hsin-Min Wang,et al.  MATBN: A Mandarin Chinese Broadcast News Corpus , 2005, Int. J. Comput. Linguistics Chin. Lang. Process..

[13]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[14]  Mauro Cettolo,et al.  Evaluation of BIC-based algorithms for audio segmentation , 2005, Comput. Speech Lang..

[15]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[16]  Aladdin M. Ariyaeeinia,et al.  Efficient Speaker Change Detection Using Adapted Gaussian Mixture Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Lie Lu,et al.  Unsupervised speaker segmentation and tracking in real-time audio content analysis , 2005, Multimedia Systems.

[18]  Aladdin M. Ariyaeeinia,et al.  On the use of the Bayesian information criterion in multiple speaker detection , 2001, INTERSPEECH.

[19]  Udi Manber,et al.  Introduction to algorithms - a creative approach , 1989 .

[20]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[21]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[22]  Wai Nang Chan,et al.  Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Aladdin M. Ariyaeeinia,et al.  An effective unsupervised scheme for multiple-speaker-change detection , 2002, INTERSPEECH.

[24]  John H. L. Hansen,et al.  Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  João Paulo da Silva Neto,et al.  Audio segmentation, classification and clustering in a broadcast news task , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[27]  Mark A. Clements,et al.  Acoustic change detection and segment clustering of two-way telephone conversations , 2003, INTERSPEECH.

[28]  Xavier Anguera Miró,et al.  Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information , 2007, IEEE Transactions on Computers.

[29]  Steve Young,et al.  The development of the 1996 HTK broadcast news transcription system , 1996 .

[30]  Hsin-Min Wang,et al.  A sequential metric-based audio segmentation method via the Bayesian information criterion , 2003, INTERSPEECH.

[31]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[32]  Ramesh A. Gopinath,et al.  Improved speaker segmentation and segments clustering using the bayesian information criterion , 1999, EUROSPEECH.

[33]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[34]  Lie Lu,et al.  Speaker change detection and tracking in real-time news broadcasting analysis , 2002, MULTIMEDIA '02.

[35]  X. Anguera XBIC : Real-Time Cross Probabilities measure for speaker segmentation , 2005 .

[36]  Chung-Hsien Wu,et al.  Multiple change-point audio segmentation and classification using an MDL-based Gaussian model , 2006, IEEE Trans. Speech Audio Process..