A comparative study of spectral clustering for i-vector-based speaker clustering under noisy conditions

The present paper dealt with speaker clustering for speech corrupted by noise. In general, the performance of speaker clustering significantly depends on how well the similarities between speech utterances can be measured. The recently proposed i-vector-based cosine similarity has yielded the state-of-the-art performance in speaker clustering systems. However, this similarity often fails to capture the speaker similarity under noisy conditions. Therefore, we attempted to examine the efficiency of spectral clustering on i-vector-based similarity for speech corrupted by noise because spectral clustering can yield robustness against noise by non-linear projection. Experimental comparisons demonstrated that spectral clustering yielded significant improvement from conventional methods, such as agglomerative clustering and k-means clustering, under non-stationary noise conditions.

[1]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[2]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[3]  James R. Glass,et al.  On the Use of Spectral and Iterative Methods for Speaker Diarization , 2012, INTERSPEECH.

[4]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Herbert Gish,et al.  Clustering speakers by their voices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[7]  James R. Glass,et al.  Exploiting Intra-Conversation Variability for Speaker Diarization , 2011, INTERSPEECH.

[8]  Tetsunori Kobayashi,et al.  ASJ continuous speech corpus for research , 1992 .

[9]  Yun Lei,et al.  A noise-robust system for NIST 2012 speaker recognition evaluation , 2013, INTERSPEECH.

[10]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[11]  Tatsuya Kawahara,et al.  Automatic transcription of spontaneous lecture speech , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[12]  Shuichi Itahashi,et al.  JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research , 1999 .

[13]  Thomas S. Huang,et al.  A spectral clustering approach to speaker diarization , 2006, INTERSPEECH.

[14]  Ponani S. Gopalakrishnan,et al.  Clustering via the Bayesian information criterion with applications in speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[16]  Ken-ichi Iso Speaker clustering using vector quantization and spectral clustering , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.