Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement

In the modulation-filtering based speech enhancement method, noise suppression is achieved by bandpass filtering the temporal trajectories of the power spectrum. In the literature, some authors use the power spectrum directly for modulation filtering, while others use different compression functions for reducing the dynamic range of the power spectrum prior to its modulation filtering. This paper compares systematically different dynamic range compression functions applied to the power spectrum for speech enhancement. Subjective listening tests and objective measures are used to evaluate the quality as well as the intelligibility of the enhanced speech. The quality is measured objectively in terms of the Perceptual Estimation of Speech Quality (PESQ) measure and the intelligibility in terms of the Speech Transmission Index (STI) measure. It is found that P 0.3333 (power spectrum raised to power 1/3) results in the highest speech quality and intelligibility.

[1]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[2]  H. Hermansky,et al.  Noise suppression in cellular communications , 1994, Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications.

[3]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[4]  Biing-Hwang Juang,et al.  Filtering the time sequences of spectral parameters for speech recognition, , 1997, Speech Commun..

[5]  Hynek Hermansky,et al.  Speech enhancement based on temporal processing , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Hans-Günter Hirsch,et al.  Improved speech recognition using high-pass filtering of subband envelopes , 1991, EUROSPEECH.

[7]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[8]  Misha Pavel,et al.  Intelligibility of speech with filtered time trajectories of spectral envelopes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[10]  W. Bastiaan Kleijn,et al.  Noise suppression based on extending a speech-dominated modulation band , 2007, INTERSPEECH.

[11]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[12]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[13]  Norinobu Yoshida,et al.  Noise reduction of speech signals by running spectrum filtering , 2006 .