论文信息 - VIVOLAB-UZ Speaker Diarization System for the Albayzin 2010 Evaluation Campaign

VIVOLAB-UZ Speaker Diarization System for the Albayzin 2010 Evaluation Campaign

Abstract This paper describes the speaker diarization systems proposedby the VIVOLAB-UZ group for the Albayzin 2010 speaker di-arization evaluation. Our approaches combine recent improve-ments in the ﬁeld of speaker segmentation in two speaker tele-phone conversations, using eigenvoice modeling, with the tra-ditional Agglomerative Hierarchical Clustering approach. Weare presenting two submissions. Our ﬁrst system uses a simpleeigenvoice factor analysis model to extract a stream of speakerfactors for every recording that enable better speaker separabil-ity. The speaker factor stream is used for speaker segmenta-tion. Then, both the clusters obtained are agglomerated usingBayesian Information Criterion as distance metric, obtainingthe speaker labels. Our second submission is exactly the samesystem, but it uses Viterbi resegmentation to reﬁne speakerchange points as a ﬁnal step.Index Terms: Speaker diarization, Factor Analysis, intra-session variability, Agglomerative Hierarchical Clustering,Bayesian Information Criterion

Carlos Vaquero | Alfonso Ortega | Eduardo Lleida

[1] Douglas A. Reynolds,et al. A study of new approaches to speaker diarization , 2009, INTERSPEECH.

[2] Marijn Huijbregts,et al. The ICSI RT07s Speaker Diarization System , 2007, CLEAR.

[3] Eduardo Lleida,et al. Confidence measures for speaker segmentation and their relation to speaker verification , 2010, INTERSPEECH.

[4] Pietro Laface,et al. Stream-based speaker segmentation using speaker factors and eigenvoices , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5] E. Lleida,et al. Intra-session Variability Compensation for Speaker Segmentation , 2010 .

[6] Stephen E. Levinson,et al. Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[7] Eduardo Lleida,et al. Intra-session variability compensation and a hypothesis generation and selection strategy for speaker segmentation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Patrick Kenny,et al. A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9] Roland Kuhn,et al. Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..