The predictive differential amplitude spectrum for robust speaker recognition in stationary noises

The performance of any speaker identification system degrades quite seriously when the acoustic conditions for testing mismatch those for training. In this paper, we propose a method to restore clean speech from noisy speech with two steps: 1) a predictive difference function is employed to estimate the differential amplitude spectrums (DAS) from both the left-side and right-side of the amplitude spectrum of the noisy speech, so as to eliminate the noise as precisely as possible, and 2) an average of the left-side and right-side integral DASs is taken as the estimated amplitude spectrum of the original clean speech. The spectrum in the traditional MFCC calculation is then replaced with this estimated amplitude and the extracted features based on this are referred to as predictive differential amplitude spectrum (PDAS) based cepstral coefficients (PDASCCs). We compare PDASCCs with cepstral mean subtraction (CMS) based, spectral subtraction (SS) based, and differential power spectrum (DPS) based cepstral coefficients at different noise levels. Experimental results show that the PDASCCs are more effective in enhancing the robustness of a speaker recognition system, and used with the CMS method the average error rate can be reduced by 7.5%.