Towards Using STT for Broadcast News Speaker Diarization

The aim of this study is to investigate the use of the linguistic information present in the audio signal to structure broadcast news data, and in particular to associate speaker identities with audio segments. While speaker recognition has been an active area of research for many years, addressing the problem of identifying speakers in huge audio corpora is relatively recent and has been mainly concerned with speaker tracking. The speech transcriptions contain a wealth of linguistic information that is useful for speaker diarization. Patterns which can be used to identify the current, previous or next speaker have been developed based on the analysis of 150 hours of manually transcribed broadcast news data. Each pattern is associated with one or more rules to assign speaker identities. After validation on the training transcripts, these patterns and rules were tested on an independent data set containing transcripts of 9 hours of broadcasts, and a speaker diarization error rate of about 11% was obtained. Future work will validate the approach on automatically generated transcripts and also combine the linguistic information with information derived from the acoustic level.