Spherical Discriminant Analysis in Semi-supervised Speaker Clustering

Semi-supervised speaker clustering refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. In the form of an independent training set, the prior knowledge helps us learn a speaker-discriminative feature transformation, a universal speaker prior model, and a discriminative speaker subspace, or equivalently a speaker-discriminative distance metric. The directional scattering patterns of Gaussian mixture model mean supervectors motivate us to perform discriminant analysis on the unit hypersphere rather than in the Euclidean space, which leads to a novel dimensionality reduction technique called spherical discriminant analysis (SDA). Our experiment results show that in the SDA subspace, speaker clustering yields superior performance than that in other reduced-dimensional subspaces (e.g., PCA and LDA).

[1]  Shuicheng Yan,et al.  Correlation Metric for Generalized Feature Extraction , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[3]  Shihong Lao,et al.  Discriminant analysis in correlation similarity measure space , 2007, ICML '07.

[4]  Shrikanth S. Narayanan,et al.  Strategies to Improve the Robustness of Agglomerative Hierarchical Clustering Under Data Source Variation for Speaker Diarization , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Marijn Huijbregts,et al.  The ICSI RT07s Speaker Diarization System , 2007, CLEAR.

[7]  Yi Liu,et al.  Recent advances in the IBM GALE Mandarin transcription system , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jean-Luc Gauvain,et al.  Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.