Clustering similar acoustic classes in the Fishervoice framework

In the Fishervoice (FSH) based framework, the mean supervectors of the speaker models are divided into several subvectors by mixture index. However, this division strategy cannot capture local acoustic class structure information among similar acoustic classes or discriminative information between different acoustic classes. In order to verify whether or not local structure information can help improve system performance, we develop five different speaker supervector segmentation methods. Experiments on NIST SRE08 prove that clustering similar acoustic classes together improves the system performance. In particular, the proposed method of equal size clustering achieves 5.1% relative decrease on EER compared to FSH1.

[1]  Zhifeng Li,et al.  An enhanced Fishervoice subspace framework for text-independent speaker verification , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[2]  Dahua Lin,et al.  Nonparametric Discriminant Analysis for Face Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Zhifeng Li,et al.  Fishervioce: A discriminant subspace framework for speaker recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[7]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[8]  Zhifeng Li,et al.  An Analysis Framework Based on Random Subspace Sampling for Speaker Verification , 2011, INTERSPEECH.

[9]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .