Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech
暂无分享,去创建一个
Speaker Diarization and Automatic Speech Recognition have been a topic of research for decades. Evaluating the developed systems has been required for almost as long. Following the NIST initiatives a number of metrics have become standard to handle these evaluations, namely the Diarization Error Rate and the Word Error Rate. The initial definitions of these metrics and, more importantly, their implementations, were designed for single-speaker speech. One of the aims of the OSEO Quaero and the ANR ETAPE projects was to investigate the capabilities of Diarization and ASR systems in the presence of overlapping speech. Evaluating said systems required extending the metrics definitions and adapting the algorithmic approaches required for their implementation. This paper presents these extensions and adaptations and the open tools that provide them.
[1] Olivier Galibert,et al. The ETAPE corpus for the evaluation of speech-based TV content processing in the French language , 2012, LREC.
[2] Olivier Galibert,et al. The REPERE challenge: finding people in a multimodal context , 2012, Odyssey.
[3] David S Pallet. Performance assessment of automatic speech recognizers , 1985 .
[4] Tanja Schultz,et al. Investigation of Cross-Show Speaker Diarization , 2011, INTERSPEECH.