Exploring Sparse Representation Classification for Speaker Verification in Realistic Environment

We address the problem of speaker verification (SV) by exploiting discriminative classification ability of the sparse representation. The proposed sparse representation based speaker verification (SR-SV) system uses dictionary created with the mean supervectors derived from adapted GMMs. For classification purpose, the sparse coefficients obtained by the l 1 minimization are used employing different scoring methods. The SV systems are developed using the speech data collected in realistic environments with multiple sensors. On comparing with standard 1024 mixture GMM-UBM system, we find that 128 mixture GMM based SR-SV system performs better for all of the four sensors data considered.

[1]  Mohammed Bennamoun,et al.  Sparse Representation for Speaker Identification , 2010, 2010 20th International Conference on Pattern Recognition.

[2]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[3]  Patrick Kenny,et al.  A Joint Factor Analysis Approach to Progressive Model Adaptation in Text-Independent Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[5]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Michael Elad,et al.  Dictionaries for Sparse Representation Modeling , 2010, Proceedings of the IEEE.

[7]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[8]  S R M Prasanna,et al.  Multi-variability speech database for robust speaker recognition , 2011, 2011 National Conference on Communications (NCC).

[9]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Larry P. Heck,et al.  A model-based transformational approach to robust speaker recognition , 2000, INTERSPEECH.

[11]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Shrikanth S. Narayanan,et al.  Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Ke Huang,et al.  Sparse Representation for Signal Classification , 2006, NIPS.