论文信息 - Speaker diarization from speech transcripts

Speaker diarization from speech transcripts

The aim of this study is to investigate the use of the linguistic information present in the audio signal to structure broadcast news data, and in particular to associate speaker identities with audio segments. While speaker recognition has been an active area of research for many years, addressing the problem of identifying speakers in huge audio corpora is relatively recent and has been mainly concerned with speaker tracking. The speech transcriptions contain a wealth of linguistic information that is useful for speaker diarization. Patterns which can be used to identify the current, previous or next speaker have been developed based on the analysis of 150 hours of manually transcribed broadcast news data. Each pattern is associated with one or more rules. After validation on the training transcripts, these patterns and rules were tested on an independent data set containing transcripts of 10 hours of broadcasts.

Jean-Luc Gauvain | Lori Lamel | Leonardo Canseco-Rodriguez

[1] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[2] Sue E. Johnson,et al. Who spoke when? - automatic segmentation and clustering for determining speaker turns , 1999, EUROSPEECH.

[3] Aaron E. Rosenberg,et al. Speaker detection in broadcast speech databases , 1998, ICSLP.

[4] Jean-Marc Boite,et al. SPEAKER TRACKING IN BROADCAST AUDIO MATERIAL IN THE FRAMEWORK OF THE THISL PROJECT , 1999 .

[5] George R. Doddington,et al. Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[6] Carl Malamud,et al. Speaker identification based text to audio alignment for an audio retrieval system , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.