论文信息 - Segment generation and clustering in the HTK broadcast news transcription system

Segment generation and clustering in the HTK broadcast news transcription system

This paper describes the segmentation, gender detection and segment clustering scheme used in the 1997 HTK broadcast news evaluation system and presents results on both the unpartitioned 1996 development and the 1997 evaluation sets. The stages of our approach are presented, namely classification, segmentation and gender detection, gender relabelling, and clustering of speech segments. The evaluation audio stream has been segmented according to audio type with a frame accuracy up to 95%. Further segmentation and gender labelling gave up to 99% frame accuracy with 127 multiple speaker segments. Experiments using two different segmentation approaches and three clustering schemes are presented.

Steve Young | Thomas Hain | Philip C. Woodland | A. Tuerk | S. E. Johnson

[1] Thomas Hain,et al. The 1997 HTK broadcast news transcription system , 1998 .

[2] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[3] Steve Young,et al. The development of the 1996 HTK broadcast news transcription system , 1996 .

[4] M. A. Siegler,et al. Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[5] Frédéric Bimbot,et al. Text-free speaker recognition using an arithmetic-harmonic sphericity measure , 1993, EUROSPEECH.

[6] Mark J. F. Gales,et al. Mean and variance adaptation within the MLLR framework , 1996, Comput. Speech Lang..