What else is new than the hamming window? robust MFCCs for speaker recognition via multitapering

Usually the mel-frequency cepstral coefficients (MFCCs) are derived via Hamming windowed DFT spectrum. In this paper, we advocate to use a so-called multitaper method instead. Multitaper methods form a spectrum estimate using multiple window functions and frequency-domain averaging. Multitapers provide a robust spectrum estimate but have not received much attention in speech processing. Our speaker recognition experiment on NIST 2002 yields equal error rates (EERs) of 9.66 % (clean data) and 16.41 % (-10 dB SNR) for the conventional Hamming method and 8.13 % (clean data) and 14.63 % (-10 dB SNR) using multitapers. Multitapering is a simple and robust alternative to the Hamming window method.

[1]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[2]  Maria Hansson,et al.  Optimal cepstrum estimation using multiple windows , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Maria Hansson,et al.  A multiple window method for estimation of peaked spectra , 1997, IEEE Trans. Signal Process..

[4]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[5]  Rahim Saeidi,et al.  Particle Swarm Optimization for Sorted Adapted Gaussian Mixture Models , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[7]  Kurt S. Riedel,et al.  Minimum bias multiple taper spectral estimation , 2018, IEEE Trans. Signal Process..

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  Donald B. Percival,et al.  Spectral Analysis for Physical Applications , 1993 .

[10]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[12]  Rainer Martin,et al.  On the Statistics of Spectral Amplitudes After Variance Reduction by Temporal Cepstrum Smoothing and Cepstral Nulling , 2009, IEEE Transactions on Signal Processing.

[13]  Yi Hu,et al.  Speech enhancement based on wavelet thresholding the multitaper spectrum , 2004, IEEE Transactions on Speech and Audio Processing.

[14]  Paavo Alku,et al.  Temporally Weighted Linear Prediction Features for Tackling Additive Noise in Speaker Verification , 2010, IEEE Signal Processing Letters.

[15]  Tomi Kinnunen,et al.  Multitaper Estimation of Frequency-Warped Cepstra With Application to Speaker Verification , 2010, IEEE Signal Processing Letters.

[16]  D. Thomson,et al.  Spectrum estimation and harmonic analysis , 1982, Proceedings of the IEEE.