Enrollee-constrained sparse coding of test data for speaker verification

Abstract Recent works have reported the successful use of sparse representation (SR) over learned dictionary for speaker verification (SV) task. For large variability practical data, the SR based approaches are noted to produce inconsistent sparse coding. In other words, for the true-target trials, the dominant coefficients in the sparse codes of enrollment and test data happen to involve different atoms of the dictionary. This, in turn, enhances the false rejection rate. In this work, we propose a novel yet simple way to address that problem. The key idea is to exploit the sparse coding of enrollment data in finding the representation of the test data. As the proposed constraint affects the false alarm rate, the multi-offset decimation diversity is introduced to address the same. The combined approach has lower computational complexity yet shown to outperform the existing factor analysis based SV approach when evaluated on a large variability NIST 2012 speaker recognition evaluation dataset.

[1]  Patrick Kenny,et al.  An i-vector Extractor Suitable for Speaker Recognition with both Microphone and Telephone Speech , 2010, Odyssey.

[2]  E. Ambikairajah,et al.  Speaker verification using sparse representation classification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Wei Wang,et al.  Speaker Verification via Modeling Kurtosis Using Sparse Coding , 2016, Int. J. Pattern Recognit. Artif. Intell..

[4]  Patrick Kenny,et al.  Speaker and Session Variability in GMM-Based Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Yi Nie,et al.  Intrinsic variation robust speaker verification based on sparse representation , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[6]  John H. L. Hansen,et al.  Maximum Likelihood Acoustic Factor Analysis Models for Robust Speaker Verification in Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Andreas Fischer,et al.  Pairwise support vector machines and their application to large scale problems , 2012, J. Mach. Learn. Res..

[8]  John H. L. Hansen,et al.  Speaker Recognition by Machines and Humans: A tutorial review , 2015, IEEE Signal Processing Magazine.

[9]  Rohit Sinha,et al.  Improved speaker verification using block sparse coding over joint speaker-channel learned dictionary , 2015, TENCON 2015 - 2015 IEEE Region 10 Conference.

[10]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[12]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[13]  Rohit Sinha,et al.  Improved Structured Dictionary Learning via Correlation and Class Based Block Formation , 2018, IEEE Transactions on Signal Processing.

[14]  Goutam Saha,et al.  Speaker verification with short utterances: a review of challenges, trends and opportunities , 2017, IET Biom..

[15]  Anil Kumar Sao,et al.  Greedy dictionary learning for kernel sparse representation based classifier , 2016, Pattern Recognit. Lett..

[16]  Pietro Laface,et al.  Large-Scale Training of Pairwise Support Vector Machines for Speaker Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Rohit Sinha,et al.  Low-complexity speaker verification with decimated supervector representations , 2015, Speech Commun..

[18]  Rohit Sinha,et al.  Robust Speaker Verification With Joint Sparse Coding Over Learned Dictionaries , 2015, IEEE Transactions on Information Forensics and Security.

[19]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[20]  S. R. Mahadeva Prasanna,et al.  Speaker Verification from Short Utterance Perspective: A Review , 2018 .

[21]  Tomi Kinnunen,et al.  A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Yonina C. Eldar,et al.  Dictionary Optimization for Block-Sparse Representations , 2010, IEEE Transactions on Signal Processing.

[23]  Joel Goodman,et al.  Efficient reconstruction of block-sparse signals , 2011, 2011 IEEE Statistical Signal Processing Workshop (SSP).

[24]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[25]  Jen-Tzung Chien,et al.  DNN-Driven Mixture of PLDA for Robust Speaker Verification , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  R. Sinha,et al.  Speaker verification using sparse representation over KSVD learned dictionary , 2012, 2012 National Conference on Communications (NCC).

[27]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[28]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[29]  Aleksandr Sizov,et al.  Comparison between supervised and unsupervised learning of probabilistic linear discriminant analysis mixture models for speaker verification , 2013, Pattern Recognit. Lett..

[30]  Mohammed Bennamoun,et al.  Sparse Representation for Speaker Identification , 2010, 2010 20th International Conference on Pattern Recognition.