Combining speaker identification and BIC for speaker diarization

This paper describes recent advances in speaker diarization by incorporating a speaker identification step. This system builds upon the LIMSI baseline data partitioner used in the broadcast news transcription system. This partitioner provides a high cluster purity but has a tendency to split the data from a speaker into several clusters, when there is a large quantity of data for the speaker. Several improvements to the baseline sys- tem have been made. Firstly, a standard Bayesian information criterion (BIC) agglomerative clustering has been integrated re- placing the iterative Gaussian mixture model (GMM) cluster- ing. Then a second clustering stage has been added, using a speaker identification method with MAP adapted GMM. A fi- nal post-processing stage refines the segment boundaries using the output of the transcription system. On the RT-04f and ES- TER evaluation data, the improved multi-stage system provides between 40% and 50% reduction of the speaker error, relative to a standard BIC clustering system.