Supervector-based approaches in a discriminative framework for speaker verification in noisy environments

This paper explores the robustness of supervector-based speaker modeling approaches for speaker verification (SV) in noisy environments. In this paper speaker modeling is carried out in two different frameworks: (i) Gaussian mixture model-support vector machine (GMM-SVM) combined method and (ii) total variability modeling method. In the GMM-SVM combined method, supervectors obtained by concatenating the mean of an adapted speaker GMMs are used to train speaker-specific SVMs during the training/enrollment phase of SV. During the evaluation/testing phase, noisy test utterances transformed into supervectors are subjected to SVM-based pattern matching and classification. In the total variability modeling method, large size supervectors are reduced to a low dimensional channel robust vector (i-vector) prior to SVM training and subsequent evaluation. Special emphasis has been laid on the significance of a utterance partitioning technique for mitigating data-imbalance and utterance duration mismatches. An adaptive boosting algorithm is proposed in the total variability modeling framework for enhancing the accuracy of SVM classifiers. Experiments performed on the NIST-SRE-2003 database with training and test utterances corrupted with additive noises indicate that the aforementioned modeling methods outperform the standard GMM-universal background model (GMM-UBM) framework for SV. It is observed that the use of utterance partitioning and adaptive boosting in the speaker modeling frameworks result in substantial performance improvements under degraded conditions.

[1]  Bayya Yegnanarayana,et al.  Voice Conversion by Prosody and Vocal Tract Modification , 2006, 9th International Conference on Information Technology (ICIT'06).

[2]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[3]  William M. Campbell,et al.  Channel compensation for SVM speaker recognition , 2004, Odyssey.

[4]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[5]  Driss Matrouf,et al.  Probabilistic Approach Using Joint Clean and Noisy i-Vectors Modeling for Speaker Recognition , 2016, INTERSPEECH.

[6]  Emmanuel Vincent,et al.  Uncertainty propagation for noise robust speaker recognition: the case of NIST-SRE , 2015, INTERSPEECH.

[7]  Jieping Ye,et al.  Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems , 2005, J. Mach. Learn. Res..

[8]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[9]  Nicholas W. D. Evans,et al.  Improving the performance of text-independent short duration SVM- and GMM-based speaker verification , 2008, Odyssey.

[10]  Haizhou Li,et al.  An SVM Kernel With GMM-Supervector Based on the Bhattacharyya Distance for Speaker Recognition , 2009, IEEE Signal Processing Letters.

[11]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[12]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[13]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[14]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[15]  Alex Waibel,et al.  Robust speaker recognition , 2007 .

[16]  B. Yegnanarayana,et al.  Fast prosody modification using instants of significant excitation , 2010 .

[17]  Jason W. Pelecanos,et al.  Compensation of utterance length for speaker verification , 2004, Odyssey.

[18]  Sébastien Marcel,et al.  Boosted binary features for noise-robust speaker verification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Ja-Chen Lin,et al.  A new LDA-based face recognition system which can solve the small sample size problem , 1998, Pattern Recognit..

[20]  Yun Lei,et al.  Towards noise-robust speaker recognition using probabilistic linear discriminant analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[22]  Sourjya Sarkar,et al.  Stochastic feature compensation methods for speaker verification in noisy environments , 2014, Appl. Soft Comput..

[23]  Man-Wai Mak,et al.  Utterance partitioning with acoustic vector resampling for GMM-SVM speaker verification , 2011, Speech Commun..

[24]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[25]  Man-Wai Mak,et al.  Boosting the Performance of I-Vector Based Speaker Verification via Utterance Partitioning , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[28]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[29]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[31]  Jian Li,et al.  Reducing the Overfitting of Adaboost by Controlling its Data Distribution Skewness , 2006, Int. J. Pattern Recognit. Artif. Intell..

[32]  Pedro J. Moreno,et al.  Speech recognition in noisy environments , 1996 .

[33]  Patrick Kenny,et al.  Speaker and Session Variability in GMM-Based Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[35]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[36]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[37]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[38]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[39]  Themos Stafylakis,et al.  PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[41]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[42]  Mark J. F. Gales,et al.  Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..

[43]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[44]  Lukás Burget,et al.  Support vector machines and Joint Factor Analysis for speaker verification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[45]  S. R. M. Prasanna,et al.  Significance of Vowel-Like Regions for Speaker Verification Under Degraded Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[46]  Xiaowei Yang,et al.  Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning , 2009, ADMA.

[47]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[48]  Chng Eng Siong,et al.  DNN feature compensation for noise robust speaker verification , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[49]  Sourjya Sarkar,et al.  A novel boosting algorithm for improved i-vector based speaker verification in noisy environments , 2014, INTERSPEECH.

[50]  Yun Lei,et al.  Application of convolutional neural networks to speaker recognition in noisy conditions , 2014, INTERSPEECH.

[51]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[52]  Pradip K. Das,et al.  i-Vectors in speech processing applications: a survey , 2015, Int. J. Speech Technol..

[53]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[54]  Sungzoon Cho,et al.  EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems , 2006, ICONIP.

[55]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[56]  Sridha Sridharan,et al.  i-vector Based Speaker Recognition on Short Utterances , 2011, INTERSPEECH.

[57]  Sourjya Sarkar,et al.  Significance of utterance partitioning in GMM-SVM based speaker verification in varying background environment , 2013, 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).

[58]  Patrick Kenny,et al.  Factor analysis simplified [speaker verification applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..