An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification

In this paper, we propose an integration of random subspace sampling and Fishervoice for speaker verification. In the previous random sampling framework [1], we randomly sample the JFA feature space into a set of low-dimensional subspaces. For every random subspace, we use Fishervoice to model the intrinsic vocal characteristics in a discriminant subspace. The complex speaker characteristics are modeled through multiple subspaces. Through a fusion rule, we form a more powerful and stable classifier that can preserve most of the discriminative information. But in many cases, random subspace sampling may discard too much useful discriminative information for high-dimensional feature space. Instead of increasing the number of random subspace or using more complex fusion rules which increase system complexity, we attempt to increase the performance of each individual weak classifier. Hence, we propose to investigate the integration of random subspace sampling with the Fishervoice approach. The proposed new framework is shown to provide better performance in both NIST SRE08 and NIST SRE10 evaluation corpora. Besides, we also apply Probabilistic Linear Discriminant Analysis (PLDA) on the supervector space for comparision. Our proposed framework can improve PLDA performance by a relative decrease of 12.47% in EER and reduced the minDCF from 0.0216 to 0.0210.

[1]  Zhifeng Li,et al.  An enhanced Fishervoice subspace framework for text-independent speaker verification , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[2]  Tin Kam Ho,et al.  Nearest Neighbors in Random Subspaces , 1998, SSPR/SPR.

[3]  Bin Ma,et al.  PLDA Modeling in I-Vector and Supervector Space for Speaker Verification , 2012, INTERSPEECH.

[4]  Sergey Ioffe,et al.  Probabilistic Linear Discriminant Analysis , 2006, ECCV.

[5]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[7]  Xiaogang Wang,et al.  A unified framework for subspace face recognition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[9]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  李志锋 An Analysis Framework based on Random Subspace Sampling for Speaker Verification , 2011 .

[11]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[12]  Xiaogang Wang,et al.  Random Sampling for Subspace Face Recognition , 2006, International Journal of Computer Vision.

[13]  Xiaogang Wang,et al.  Random sampling LDA for face recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[14]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Zhifeng Li,et al.  Fishervioce: A discriminant subspace framework for speaker recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Eero P. Simoncelli,et al.  Nonlinear Extraction of Independent Components of Natural Images Using Radial Gaussianization , 2009, Neural Computation.

[17]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.