论文信息 - Use of Vocal Source Features in Speaker Segmentation

Use of Vocal Source Features in Speaker Segmentation

This paper addresses the problem of speaker segmentation in telephone conversation. The segmentation is done in three steps: 1) preliminary segmentation to hypothesize speaker turning points; 2) clustering of segments; and 3) re-segmentation to determine speaker identity of each segment. It is found that vocal source related features are more speaker-discriminative than the conventional vocal tract related features for small amount of data. This motivates us to thoughtfully incorporate vocal source features into early stages of the speaker segmentation process, where decisions have to be made with limited data. Speaker segmentation experiments are carried out on 36 summed channel conversations in the NIST 2004 Speaker Recognition Evaluation. The proposed use of vocal source features leads to noticeable performance improvement

Hua Ouyang | Wai Nang Chan | Nengheng Zheng | Tan Lee

[1] Lori Lamel,et al. The LIMSI 1998 Hub-4E Transcription System , 1997 .

[2] Jitendra Ajmera,et al. A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[3] Douglas A. Reynolds,et al. Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[4] Jean-Luc Gauvain,et al. Improving Speaker Diarization , 2004 .

[5] Christian Wellekens,et al. DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[6] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[7] Nengheng Zheng,et al. Time -frequency analysis of vocal source signal for speaker recognition , 2004, INTERSPEECH.

[8] Jean-Luc Gauvain,et al. Audio Partitioning and Transcription for Broadcast Data Indexation , 2004, Multimedia Tools and Applications.