论文信息 - Combining deep speaker specific representations with GMM-SVM for speaker verification

Combining deep speaker specific representations with GMM-SVM for speaker verification

This study combines a Gaussian mixture model support vector machine (GMM-SVM) system with a nonlinear feature transformation, discriminatively trained to extract speaker specific features from MFCCs. Separation of the speaker information component and non-speaker related information in the speech signal is accomplished using a regularized siamese deep network (RSDN). RSDN learns a hidden representation that well characterizes speaker information by training a subset of the hidden units using pairs of speech segments. MFCC features are input to a trained RSDN and a subset of hidden layer outputs are used as new input features in a GMM-SVM system. We demonstrate the potential of this approach for text-independent speaker verification by applying it to a subset of the NIST SRE 2006 1conv4w-1conv4w task. The hybrid RSDN GMM-SVM system achieves about 5% relative improvement over the baseline GMM-SVM system.

[1] Jr. J.P. Campbell,et al. Speaker recognition: a tutorial , 1997, Proc. IEEE.

[2] Mitch Weintraub,et al. NONLINEAR DISCRIMINANT FEATURE EXTRACTION FOR ROBUST TEXT-INDEPENDENT SPEAKER RECOGNITION , 1997 .

[3] Roland Auckenthaler,et al. Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[4] Jean-François Bonastre,et al. Localization and selection of speaker-specific information with statistical modeling , 2000, Speech Commun..

[5] Larry P. Heck,et al. Robustness to telephone handset distortion in speaker recognition by discriminative feature design , 2000, Speech Commun..

[6] Sridha Sridharan,et al. Feature warping for robust speaker verification , 2001, Odyssey.

[7] Marcos Faúndez-Zanuy,et al. A new nonlinear feature extraction algorithm for speaker verification , 2004, INTERSPEECH.

[8] William M. Campbell,et al. Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9] Andrew C. Morris,et al. MLP Internal Representation as Discriminative Features for Improved Speaker Recognition , 2005, NOLISP.

[10] Dalei Wu,et al. MLP trained to separate problem speakers provides improved features for speaker identification , 2005, Proceedings 39th Annual 2005 International Carnahan Conference on Security Technology.

[11] Douglas E. Sturim,et al. Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[12] Speaker Recognition Via Nonlinear Discriminant Features , 2007 .

[13] Driss Matrouf,et al. State-of-the-Art Performance in Text-Independent Speaker Verification Through Open-Source Software , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14] William M. Campbell,et al. Text-Independent Speaker Recognition , 2008 .

[15] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[16] Ahmad Salman,et al. Learning Speaker-Specific Characteristics With a Deep Neural Architecture , 2011, IEEE Transactions on Neural Networks.

[17] Ke Chen,et al. Extracting Speaker-Specific Information with a Regularized Siamese Deep Network , 2011, NIPS.

[18] Ke Chen,et al. Exploring speaker-specific characteristics with deep learning , 2011, The 2011 International Joint Conference on Neural Networks.