Sparse representation over learned and discriminatively learned dictionaries for speaker verification

In this work, a speaker verification (SV) method is proposed employing the sparse representation of GMM mean shifted supervectors over learned and discriminatively learned dictionaries. This work is motivated by recently proposed speaker verification methods employing the sparse representation classification (SRC) over exemplar dictionaries created from either GMM mean shifted supervectors or i-vectors. The proposed approach with discriminatively learned dictionary results in an equal error rate of 1.53 % which is found to be better than those of similar complexity SV systems developed using the i-vector based approach and the exemplar based SRC approaches with session/channel variability compensation on NIST 2003 SRE dataset.

[1]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[2]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[4]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[6]  Mohammed Bennamoun,et al.  Sparse Representation for Speaker Identification , 2010, 2010 20th International Conference on Pattern Recognition.

[7]  Yonghong Yan,et al.  Speaker Verification Using Sparse Representations on Total Variability i-vectors , 2011, INTERSPEECH.

[8]  Shrikanth S. Narayanan,et al.  Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Guillermo Sapiro,et al.  Sparse representations for image classification: learning discriminative and reconstructive non-parametric dictionaries , 2008 .

[10]  Daniel Garcia-Romero,et al.  Joint Factor Analysis for Speaker Recognition Reinterpreted as Signal Coding Using Overcomplete Dictionaries , 2010, Odyssey.

[11]  E. Ambikairajah,et al.  Speaker verification using sparse representation classification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Patrick Kenny,et al.  Speaker and Session Variability in GMM-Based Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.