论文信息 - Towards Using STT for Broadcast News Speaker Diarization

Towards Using STT for Broadcast News Speaker Diarization

The aim of this study is to investigate the use of the linguistic information present in the audio signal to structure broadcast news data, and in particular to associate speaker identities with audio segments. While speaker recognition has been an active area of research for many years, addressing the problem of identifying speakers in huge audio corpora is relatively recent and has been mainly concerned with speaker tracking. The speech transcriptions contain a wealth of linguistic information that is useful for speaker diarization. Patterns which can be used to identify the current, previous or next speaker have been developed based on the analysis of 150 hours of manually transcribed broadcast news data. Each pattern is associated with one or more rules to assign speaker identities. After validation on the training transcripts, these patterns and rules were tested on an independent data set containing transcripts of 9 hours of broadcasts, and a speaker diarization error rate of about 11% was obtained. Future work will validate the approach on automatically generated transcripts and also combine the linguistic information with information derived from the acoustic level.

Jean-Luc Gauvain | Lori Lamel | Leonardo Canseco-Rodriguez

[1] Sue E. Johnson,et al. Who spoke when? - automatic segmentation and clustering for determining speaker turns , 1999, EUROSPEECH.

[2] Jean-Marc Boite,et al. SPEAKER TRACKING IN BROADCAST AUDIO MATERIAL IN THE FRAMEWORK OF THE THISL PROJECT , 1999 .

[3] Miss A.O. Penney. (b) , 1974, The New Yale Book of Quotations.

[4] George R. Doddington,et al. Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[5] David S. Pallett. The role of the National Institute of Standards and Technology in DARPA's Broadcast News continuous speech recognition research program , 2002, Speech Commun..

[6] Carl Malamud,et al. Speaker identification based text to audio alignment for an audio retrieval system , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[8] David Graff. An overview of Broadcast News corpora , 2002, Speech Commun..

[9] Aaron E. Rosenberg,et al. Speaker detection in broadcast speech databases , 1998, ICSLP.