Clustering short push-to-talk segments

We present a method for clustering short push-to-talk speech segments in the presence of different numbers of speakers. Iterative Mean Shift algorithm based on the cosine distance is used to perform speaker clustering on i-vectors generated from many short speech segments. We report results as measured by the Accuracy, the average number of detected speakers (ANDS), the average cluster purity (ACP), the average speaker purity (ASP) and K . We achieve clustering accuracy of: 90.0%, 86.9% and 72.1% for 3, 15 and 60 speakers respectively.

[1]  Themos Stafylakis,et al.  Efficient iterative mean shift based cosine dissimilarity for multi-recording speaker clustering , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[5]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Hervé Bourlard,et al.  Unknown-multiple speaker clustering using HMM , 2002, INTERSPEECH.

[7]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Themos Stafylakis,et al.  A Study of the Cosine Distance-Based Mean Shift for Telephone Speech Diarization , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Vassilis Katsouros,et al.  Mean shift algorithm for exponential families with applications to speaker clustering , 2012, Odyssey.

[10]  James R. Glass,et al.  Cosine Similarity Scoring without Score Normalization Techniques , 2010, Odyssey.

[11]  Vassilis Katsouros,et al.  Speaker clustering via the mean shift algorithm , 2010, Odyssey.

[12]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[13]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.