Analyzing noise robustness of MFCC and GFCC features in speaker identification

Automatic speaker recognition can achieve a high level of performance in matched training and testing conditions. However, such performance drops significantly in mismatched noisy conditions. Recent research indicates that a new speaker feature, gammatone frequency cepstral coefficients (GFCC), exhibits superior noise robustness to commonly used mel-frequency cepstral coefficients (MFCC). To gain a deep understanding of the intrinsic robustness of GFCC relative to MFCC, we design speaker identification experiments to systematically analyze their differences and similarities. This study reveals that the nonlinear rectification accounts for the noise robustness differences primarily. Moreover, this study suggests how to enhance MFCC robustness, and further improve GFCC robustness by adopting a different time-frequency representation.

[1]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Daniel Garcia-Romero,et al.  Linear versus mel frequency cepstral coefficients for speaker recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[3]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[4]  Lukás Burget,et al.  Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  DeLiang Wang,et al.  CASA-Based Robust Speaker Identification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Alan V. Oppenheim,et al.  Discrete-time signal processing (2nd ed.) , 1999 .

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[10]  DeLiang Wang,et al.  Incorporating Auditory Feature Uncertainties in Robust Speaker Identification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.