Noise-robust speaker recognition based on morphological component analysis

Speaker recognition suffers severe performance degradation under noisy environments. To solve this problem, we propose a novel method based on morphological component analysis. This method employs a universal background dictionary (UBD) to model common variability of all speakers, a speech dictionary of each speaker to model special variability of this speaker and a noise dictionary to model variability of environmental noise. These three dictionaries are concatenated to be a big dictionary, over which test speech is sparsely represented and classified. To improve the discriminability of speaker dictionaries, we optimize the speaker dictionaries by removing speaker atoms which are close to the UBD atoms. To ensure varying noises can be tracked, we design an algorithm to update the noise dictionary with the noisy speech. We finally conduct experiments under various noise conditions and the results show that the proposed method can obviously improve the robustness of speaker recognition under noisy environments.

[1]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  E. Ambikairajah,et al.  Speaker verification using sparse representation classification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[4]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[5]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[6]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[7]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  Olli Viikki,et al.  A recursive feature vector normalization approach for robust speech recognition in noise , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  Mohammed Bennamoun,et al.  Sparse Representation for Speaker Identification , 2010, 2010 20th International Conference on Pattern Recognition.

[11]  E. Ambikairajah,et al.  Subband noise estimation for speech enhancement using a perceptual Wiener filter , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[13]  José L. Pérez-Córdoba,et al.  Histogram equalization of speech representation for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[14]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[16]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[17]  Mohamed-Jalal Fadili,et al.  Morphological Component Analysis: An Adaptive Thresholding Strategy , 2007, IEEE Transactions on Image Processing.

[18]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[19]  Jeff A. Bilmes,et al.  MVA Processing of Speech Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.