论文信息 - GTTS System for the Albayzin 2010 Speaker Diarization Evaluation

GTTS System for the Albayzin 2010 Speaker Diarization Evaluation

This paper briefly describes the diarization system developed by the Software Technology Working Group (http://gtts.ehu.es) at the University of the Basque Country (EHU), for the Albayzin 2010 Speaker Diarization Evaluation. The system consists of three decoupled elements: (1) speech/non-speech segmentation; (2) acoustic change detection; and (3) clustering of speech segments. Speech/non-speech segmentation is performed by means of one of the systems presented to the Albayzin 2010 Audio Segmentation Evaluation. With the aim to detect speaker changes, speech segments are further segmented by means of a naive metric-based approach which locates the most likely spectral change points. The third element is based on a dotscoring speaker verification system: speech segments are represented by MAP-adapted GMM zero and first order statistics, dot scoring is applied to compute a similarity measure between segments (or clusters) and finally an agglomerative clustering algorithm is applied until no pair of clusters exceeds a similarity threshold.

Germn Bordel | Mikel Penagarikano | Amparo Varona | Mireia Diez | Luis Javier Rodriguez-Fuentes

[1] Luis Javier Rodríguez-Fuentes,et al. A Simple But Effective Approach to Speaker Tracking in Broadcast News , 2007, IbPRIA.

[2] Mikel Penagarikano,et al. University of the Basque Country System for NIST 2010 Speaker Recognition Evaluation , 2010 .

[3] M. Penagarikano,et al. Sautrela: a highly modular open source speech recognition framework , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[4] M. Penagarikano,et al. Layered markov models: a new architectural approach to automatic speech recognition , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[5] Mireia Díez,et al. KALAKA: A TV Broadcast Speech Database for the Evaluation of Language Recognition Systems , 2010, LREC.

[6] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[7] X. Anguera,et al. XBIC: nueva medida para segmentación de locutor hacia el indexado automático de la señal de voz , 2004 .