Benefits of prior acoustic segmentation for automatic speaker segmentation

The paper investigates the interest of segmentation in acoustic macro classes (like gender or bandwidth) as front-end processing for the segmentation/diarization task. The impact of this prior acoustic segmentation is evaluated in terms of speaker diarization performance in the particular context of NIST RT'03 evaluation (done on the HUB4 broadcast news corpora). It is rarely discussed in the literature, but our work shows that the application of prior acoustic segmentation, in a similar way to the automatic speech recognition task, may be very useful to the speaker segmentation task. Experiments were conducted using two different kinds of speaker segmentation systems developed individually by the LIA and CLIPS laboratories in the framework of the ELISA consortium. For both systems, improvement was observed when combined with prior acoustic segmentation. However, a larger impact, in terms of performance, is observed on the LIA system based on an ascending/HMM approach compared to the CLIPS system based on speaker turn detection.