Improved MFCC-based feature for robust speaker identification

Abstract The Mel-frequency cepstral coefficient (MFCC) is the most widely used feature in speech and speaker recognition. However, MFCC is very sensitive to noise interference, which tends to drastically degrade the performance of recognition systems because of the mismatches between training and testing. In this paper, the logarithmic transformation in the standard MFCC analysis is replaced by a combined function to improve the noisy sensitivity. The proposed feature extraction process is also combined with speech enhancement methods, such as spectral subtraction and median-filter to further suppress the noise. Experiments show that the proposed robust MFCC-based feature significantly reduces the recognition error rate over a wide signal-to-noise ratio range.

[1]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[2]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[3]  Richard J. Mammone,et al.  New LP-derived features for speaker identification , 1994, IEEE Trans. Speech Audio Process..

[4]  R. P. Ramachandran,et al.  Robust speaker recognition: a feature-based approach , 1996, IEEE Signal Processing Magazine.

[5]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[6]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[7]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.