On exploring the similarity and fusion of i-vector and sparse representation based speaker verification systems

The total variability based i-vector has become one of the most dominant approaches for speaker verification. In addition to this, recently the sparse representation (SR) based speaker verification approaches have also been proposed and are found to give comparable performance. In SR based approach, the dictionary used for sparse representation is either exemplar or learned from data using the KSVD algorithms and its variants. Recently the use of the total variability matrix of the i-vector system as the dictionary for the SR based approach has also been reported. Motivated by these, in this work, we first highlight the similarity between the i-vector and the learned dictionary SR based approaches for speaker verification. It is followed by the exploration about various kinds of learned dictionaries, their sizes and the sparsity constraint in context of SR based speaker verification. Further we have explored the feature level as well as the scores level fusions of these two approaches.

[1]  Guillermo Sapiro,et al.  Sparse representations for image classification: learning discriminative and reconstructive non-parametric dictionaries , 2008 .

[2]  Shrikanth Narayanan,et al.  SPEAKER VERIFICATION USING LASSO BASED SPARSE TOTAL VARIABILITY SUPERVECTOR AND PROBABILISTIC LINEAR DISCRIMINANT ANALYSIS , 2011 .

[3]  E. Ambikairajah,et al.  Speaker verification using sparse representation classification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Patrick Kenny,et al.  Speaker and Session Variability in GMM-Based Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  R. Sinha,et al.  Sparse representation of total variability smoothed GMM mean supervectors for speaker verification , 2012, 2012 International Conference on Signal Processing and Communications (SPCOM).

[6]  Mohammed Bennamoun,et al.  Sparse Representation for Speaker Identification , 2010, 2010 20th International Conference on Pattern Recognition.

[7]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[8]  R. Sinha,et al.  Speaker verification using sparse representation over KSVD learned dictionary , 2012, 2012 National Conference on Communications (NCC).

[9]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[10]  Rohit Sinha,et al.  Sparse representation over learned and discriminatively learned dictionaries for speaker verification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[13]  Yonghong Yan,et al.  Speaker Verification Using Sparse Representations on Total Variability i-vectors , 2011, INTERSPEECH.

[14]  Rohit Sinha,et al.  Exploring Sparse Representation Classification for Speaker Verification in Realistic Environment , 2011 .

[15]  Patrick Kenny,et al.  Mixture of PLDA Models in i-vector Space for Gender-Independent Speaker Recognition , 2011, INTERSPEECH.