Noise Reduction and Increased VAD Accuracy Using Spectral Subtraction

paper shows that performing voice activity detection (VAD) on the output of a spectral subtraction noised reduced signal increases the accuracy of the VAD and reduces the VAD sensitivity to fixed thresholds. An initial VAD decision is used to control the noise estimate update in the spectral subtraction algorithm. The more accurate VAD after the first spectral subtraction is then used to reprocess the original noisy speech again via spectral subtraction to reduce the noise while not attenuating the speech. Auditory masking thresholds were used to weight the spectral subtraction to avoid the introduction of musical noise artifacts. Energy thresholds were used to detect voiced frames of speech recorded inside a car at an 8kHz sampling rate and combined with four different noise conditions. The received noise and the speech were combined to produce inputs to the algorithm at 0, 5, and 10 dB SNR where it was shown that the VAD accuracy consistently increased after spectral subtraction. However, if the VAD and spectral subtraction were iterated more then twice on the signal, then the VAD accuracy started to decrease. Visual inspection of the clean speech was used to determine which frames should be classified as voice and used to determine the accuracy of the VAD algorithm. The VAD accuracy was only increased by a few percent in each case, but this small improvement makes a big difference when using the resulting decisions to control the noise estimate of the spectral subtraction algorithm in order to avoid attenuating the speech. Modifications of the fixed offset for detecting voice had less of an effect when the VAD operated on the signal after spectral subtraction and compared to VAD processing on the original signal, which can be attributed to the reduced variance in the noise. Objective speech quality measures show that the algorithm removes a large amount of the stationary noise in a hands-free environment of an automobile with relatively minimal speech distortion.