论文信息 - Supervector-based approaches in a discriminative framework for speaker verification in noisy environments

Supervector-based approaches in a discriminative framework for speaker verification in noisy environments

This paper explores the robustness of supervector-based speaker modeling approaches for speaker verification (SV) in noisy environments. In this paper speaker modeling is carried out in two different frameworks: (i) Gaussian mixture model-support vector machine (GMM-SVM) combined method and (ii) total variability modeling method. In the GMM-SVM combined method, supervectors obtained by concatenating the mean of an adapted speaker GMMs are used to train speaker-specific SVMs during the training/enrollment phase of SV. During the evaluation/testing phase, noisy test utterances transformed into supervectors are subjected to SVM-based pattern matching and classification. In the total variability modeling method, large size supervectors are reduced to a low dimensional channel robust vector (i-vector) prior to SVM training and subsequent evaluation. Special emphasis has been laid on the significance of a utterance partitioning technique for mitigating data-imbalance and utterance duration mismatches. An adaptive boosting algorithm is proposed in the total variability modeling framework for enhancing the accuracy of SVM classifiers. Experiments performed on the NIST-SRE-2003 database with training and test utterances corrupted with additive noises indicate that the aforementioned modeling methods outperform the standard GMM-universal background model (GMM-UBM) framework for SV. It is observed that the use of utterance partitioning and adaptive boosting in the speaker modeling frameworks result in substantial performance improvements under degraded conditions.

Sourjya Sarkar | K. Sreenivasa Rao

[1] Bayya Yegnanarayana,et al. Voice Conversion by Prosody and Vocal Tract Modification , 2006, 9th International Conference on Information Technology (ICIT'06).

[2] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[3] William M. Campbell,et al. Channel compensation for SVM speaker recognition , 2004, Odyssey.

[4] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[5] Driss Matrouf,et al. Probabilistic Approach Using Joint Clean and Noisy i-Vectors Modeling for Speaker Recognition , 2016, INTERSPEECH.

[6] Emmanuel Vincent,et al. Uncertainty propagation for noise robust speaker recognition: the case of NIST-SRE , 2015, INTERSPEECH.

[7] Jieping Ye,et al. Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems , 2005, J. Mach. Learn. Res..

[8] Alvin F. Martin,et al. The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[9] Nicholas W. D. Evans,et al. Improving the performance of text-independent short duration SVM- and GMM-based speaker verification , 2008, Odyssey.

[10] Haizhou Li,et al. An SVM Kernel With GMM-Supervector Based on the Bhattacharyya Distance for Speaker Recognition , 2009, IEEE Signal Processing Letters.

[11] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[12] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[13] Steve Renals,et al. Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[14] Douglas E. Sturim,et al. Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[15] Alex Waibel,et al. Robust speaker recognition , 2007 .

[16] B. Yegnanarayana,et al. Fast prosody modification using instants of significant excitation , 2010 .

[17] Jason W. Pelecanos,et al. Compensation of utterance length for speaker verification , 2004, Odyssey.

[18] Sébastien Marcel,et al. Boosted binary features for noise-robust speaker verification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19] Ja-Chen Lin,et al. A new LDA-based face recognition system which can solve the small sample size problem , 1998, Pattern Recognit..

[20] Yun Lei,et al. Towards noise-robust speaker recognition using probabilistic linear discriminant analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21] Herman J. M. Steeneken,et al. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[22] Sourjya Sarkar,et al. Stochastic feature compensation methods for speaker verification in noisy environments , 2014, Appl. Soft Comput..

[23] Man-Wai Mak,et al. Utterance partitioning with acoustic vector resampling for GMM-SVM speaker verification , 2011, Speech Commun..

[24] Nello Cristianini,et al. Controlling the Sensitivity of Support Vector Machines , 1999 .

[25] Man-Wai Mak,et al. Boosting the Performance of I-Vector Based Speaker Verification via Utterance Partitioning , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[26] James R. Glass,et al. Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[27] Edward Y. Chang,et al. KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[28] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.

[29] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[30] Stephen Kwek,et al. Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[31] Jian Li,et al. Reducing the Overfitting of Adaboost by Controlling its Data Distribution Skewness , 2006, Int. J. Pattern Recognit. Artif. Intell..

[32] Pedro J. Moreno,et al. Speech recognition in noisy environments , 1996 .

[33] Patrick Kenny,et al. Speaker and Session Variability in GMM-Based Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[34] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[35] George R. Doddington,et al. Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[36] Andreas Stolcke,et al. Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[37] Nitesh V. Chawla,et al. SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[38] William M. Campbell,et al. Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[39] Themos Stafylakis,et al. PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40] Herna L. Viktor,et al. Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[41] Yanqing Zhang,et al. SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[42] Mark J. F. Gales,et al. Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..

[43] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[44] Lukás Burget,et al. Support vector machines and Joint Factor Analysis for speaker verification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[45] S. R. M. Prasanna,et al. Significance of Vowel-Like Regions for Speaker Verification Under Degraded Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[46] Xiaowei Yang,et al. Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning , 2009, ADMA.

[47] Patrick Kenny,et al. Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[48] Chng Eng Siong,et al. DNN feature compensation for noise robust speaker verification , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[49] Sourjya Sarkar,et al. A novel boosting algorithm for improved i-vector based speaker verification in noisy environments , 2014, INTERSPEECH.

[50] Yun Lei,et al. Application of convolutional neural networks to speaker recognition in noisy conditions , 2014, INTERSPEECH.

[51] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[52] Pradip K. Das,et al. i-Vectors in speech processing applications: a survey , 2015, Int. J. Speech Technol..

[53] Patrick Kenny,et al. Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[54] Sungzoon Cho,et al. EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems , 2006, ICONIP.

[55] Patrick Kenny,et al. Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[56] Sridha Sridharan,et al. i-vector Based Speaker Recognition on Short Utterances , 2011, INTERSPEECH.

[57] Sourjya Sarkar,et al. Significance of utterance partitioning in GMM-SVM based speaker verification in varying background environment , 2013, 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).

[58] Patrick Kenny,et al. Factor analysis simplified [speaker verification applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..