Robust Principal Component Analysis Based Speaker Verification Under Additive Noise Conditions

Previous researches show that the approaches based on the total variability space (TVS) followed by Gaussian probabilistic linear discriminant analysis (GPLDA) work effectively for dealing with convolutional noise (such as channel noise) and can bring some degree of gains in term of accuracy under additive noisy environment as well. However they meet difficulty while many types of noises are unseen and non-stationary in real world. To address this issue, we introduce the robust principal component analysis (RPCA) into the TVS modeled speaker verification system, called RPCA-TVS, which regards the noise spectrum as the low-rank component and the speech spectrum as the sparse component in short-time Fourier transform (SFT) domain. The highlighting of this paper is to improve the robustness of speaker verification under additive noisy environment, especially in non-stationary and unseen noise conditions. For evaluating the performance, we designed and generated an additive noisy corpus, based on the TIMIT and NUST603-2014 database, using the NaFT tools with 12 types of noise samples deriving from NOISEX-92 and FREESOUND. Experimental results demonstrate that the proposed RPCA-TVS can achieve better performance than the competing methods at various signal-to-noise ratio (SNR) levels. Especially, RPCA-TVS reduces the equal error rate (EER) by 5.12 % in average than the multi-condition system under additive noise conditions at SNR = 8 dB.

[1]  Tuomas Virtanen,et al.  Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Yifan Gong,et al.  An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[4]  Longbiao Wang,et al.  PLDA in the i-supervector space for text-independent speaker verification , 2014, EURASIP J. Audio Speech Music. Process..

[5]  Joaquin Gonzalez-Rodriguez,et al.  Evaluating Automatic Speaker Recognition systems: An overview of the NIST Speaker Recognition Evaluations (1996-2014) , 2014 .

[6]  Yun Lei,et al.  Towards noise-robust speaker recognition using probabilistic linear discriminant analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Driss Matrouf,et al.  Additive noise compensation in the i-vector space for speaker recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Jen-Tzung Chien,et al.  Mixture of PLDA for Noise Robust I-Vector Speaker Verification , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Thomas Fang Zheng,et al.  Unseen Noise Estimation Using Separable Deep Auto Encoder for Speech Enhancement , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Douglas D. O'Shaughnessy,et al.  Improving the performance of far-field speaker verification using multi-condition training: the case of GMM-UBM and i-vector systems , 2014, INTERSPEECH.

[11]  Ying Chen,et al.  I-vector based speaker gender recognition , 2015, 2015 IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC).

[12]  Wei Li,et al.  An improved i-vector extraction algorithm for speaker verification , 2015, EURASIP J. Audio Speech Music. Process..

[13]  Jin Young Kim,et al.  Robust Speaker Verification Using Low-Rank Recovery under Total Variability Space , 2015, 2015 5th International Conference on IT Convergence and Security (ICITCS).

[14]  Paris Smaragdis,et al.  Singing-voice separation from monaural recordings using robust principal component analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[16]  Victor Vianu,et al.  Invited articles section foreword , 2010, JACM.

[17]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Guizhong Liu,et al.  Separation of Singing Voice Using Nonnegative Matrix Partial Co-Factorization for Singer Identification , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Man-Wai Mak,et al.  SNR-Invariant PLDA Modeling in Nonparametric Subspace for Robust Speaker Verification , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Sridha Sridharan,et al.  I-vector based speaker recognition using advanced channel compensation techniques , 2014, Comput. Speech Lang..

[21]  Bisrat Derebssa Dufera,et al.  Noise robust speaker verification using GMM-UBM multi-condition training , 2015, AFRICON 2015.