论文信息 - FFT-Based Block Processing in Speech Enhancement: Potential Artifacts and Solutions

FFT-Based Block Processing in Speech Enhancement: Potential Artifacts and Solutions

Most speech enhancement applications perform frequency shaping by means of multiplication in the frequency domain. Operating in the frequency domain is equivalent to convolution in the time domain. In these speech enhancement algorithms, the updating of frequency response alone cannot ensure the fulfillment of the conditions required for multiplication in frequency to correspond to linear convolution instead of circular convolution. As a result, artifacts and distortions may be present in the output of a standard fast Fourier transform (FFT)-based algorithm. Typical methods to deal with these artifacts involve overlapping and windowing. However, even using these strategies, artifacts may be perceptually noticeable under certain signal-to-noise ratio (SNR) conditions and/or when a high sampling frequency is employed. This paper analyzes the efficiency of the standard methods, explains the source of these distortions, provides a perceptual evidence of these artifacts, and proposes two alternative methods to perform artifact-free and distortion-free FFT convolution. These methods are based on the extension of the impulse response and the splitting of the impulse response in two impulse responses, operations that are performed in the frequency-domain. Computational costs and performance of the proposed techniques are also discussed.

D. V. Anderson | J. I. Marin-Hurtado

[1] H. T. Hu,et al. Adaptive noise spectral estimation for spectral subtraction speech enhancement , 2007 .

[2] A. W. M. van den Enden,et al. Discrete Time Signal Processing , 1989 .

[3] Yang Lu,et al. A geometric approach to spectral subtraction , 2008, Speech Commun..

[4] David Malah,et al. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[5] Philipos C. Loizou,et al. Speech Enhancement: Theory and Practice , 2007 .

[6] Tamar Frankel. [The theory and the practice...]. , 2001, Tijdschrift voor diergeneeskunde.

[7] Thomas F. Quatieri,et al. Noise reduction based on spectral change , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[8] A.V. Oppenheim,et al. Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[9] David V. Anderson,et al. Distortions in speech enhancement due to block processing , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10] Philipos C. Loizou,et al. Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum , 2005, IEEE Transactions on Speech and Audio Processing.

[11] Yi Hu,et al. Subjective Comparison of Speech Enhancement Algorithms , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12] Nathalie Virag,et al. Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[13] Pascal Scalart,et al. Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[14] IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[15] Peter Jax,et al. A novel psychoacoustically motivated audio enhancement algorithm preserving background noise characteristics , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[16] Alan V. Oppenheim,et al. Discrete-time signal processing (2nd ed.) , 1999 .

[17] Ronald E. Crochiere,et al. A weighted overlap-add method of short-time Fourier analysis/Synthesis , 1980 .

[18] Cheh Pan. Gibbs phenomenon removal and digital filtering directly through the fast Fourier transform , 2001, IEEE Trans. Signal Process..

[19] Thippur V. Sreenivas,et al. Blocking artifacts in speech/audio: Dynamic auditory model-based characterization and optimal time-frequency smoothing , 2009, Signal Process..

[20] D. Thomson,et al. Spectrum estimation and harmonic analysis , 1982, Proceedings of the IEEE.