Incremental On-Line Clustering of Speakers' Short Segments

This paper deals with clustering of speakers’ short segments, in a scenario where additional segments continue to arrive and should be constantly clustered together with previous segments that were already clustered. In realistic applications, it is not possible to cluster all segments every time a new segment arrives. Hence, incremental clustering is applied in an on-line mode. New segments can either belong to existing speakers, therefore, have to be assigned to one of the existing clusters, or they could belong to new speakers and thus new clusters should be formed. In this work we show that if there are enough segments per speaker in the off-line initial clustering process, it constitutes a good starting point for the incremental on-line clustering. In this case, incremental online clustering can be successfully applied based on the previously proposed mean-shift clustering algorithm with PLDA score as a similarity measure and with k-nearest neighbors (kNN) neighborhood selection.

[1]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Itshak Lapidot,et al.  PLDA-based mean shift speakers' short segments clustering , 2017, Comput. Speech Lang..

[3]  Jason W. Pelecanos,et al.  Online speaker diarization using adapted i-vector transforms , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Itshak Lapidot,et al.  Online Diarization of Telephone Conversations , 2010, Odyssey.

[5]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[6]  Itshak Lapidot,et al.  On the Use of PLDA i-vector Scoring for Clustering Short Segments , 2016, Odyssey.

[7]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[8]  Koichi Shinoda,et al.  Online speaker clustering using incremental learning of an ergodic hidden Markov model , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Themos Stafylakis,et al.  A Study of the Cosine Distance-Based Mean Shift for Telephone Speech Diarization , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Itshak Lapidot,et al.  Incremental diarization of telephone conversations , 2010, INTERSPEECH.

[12]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Irit Opher,et al.  Improvements to PLDA i-vector scoring for short segments clustering , 2016, 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE).

[14]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[15]  Hervé Bourlard,et al.  Unknown-multiple speaker clustering using HMM , 2002, INTERSPEECH.

[16]  James R. Glass,et al.  Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Nicholas W. D. Evans,et al.  Adaptive and online speaker diarization for meeting data , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[18]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[19]  Alan McCree,et al.  Speaker diarization using deep neural network embeddings , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).