Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression

In this paper, we suggest a non-parametric way to improve the intelligibility of speech in noise. The signal is enhanced before presented in a noisy environment, under the constraint of equal global signal power before and after modifications. Two systems are combined in a cascade form to enhance the quality of the signal first in frequency (spectral shaping) and then in time (dynamic range compression). Experiments with speech shaped (SSN) and competing speaker (CS) types of noise at various low SNR values, show that the suggested approach outperforms state-of-the art methods in terms of the Speech Intelligibility Index (SII). In terms of SNR gain there is an improvement of 7 dB (SSN) and 8 dB (CS) over these methods. A formal listening test confirm the efficiency of the suggested system in enhancing speech intelligibility in noise.

[1]  Ching-Chung Li,et al.  Speech signal modification to increase intelligibility in noisy environments. , 2007, The Journal of the Acoustical Society of America.

[2]  B. Blesser,et al.  Audio dynamic range compression for minimum perceived distortion , 1969 .

[3]  J. C. Krause,et al.  Acoustic properties of naturally produced clear speech at normal speaking rates. , 1996, The Journal of the Acoustical Society of America.

[4]  Peter Vary,et al.  Recursive Closed-Form Optimization of Spectral Audio Power Allocation for Near End Listening Enhancement , 2010, Sprachkommunikation.

[5]  D. Paul The spectral envelope estimation vocoder , 1981 .

[6]  K. S. Rhebergen,et al.  A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.

[7]  James M. Kates Signal Processing for Hearing Aids , 2002 .

[8]  Thomas F. Quatieri,et al.  Peak-to-RMS reduction of speech based on a sinusoidal model , 1991, IEEE Trans. Signal Process..

[9]  Valerie HAZAN,et al.  1 CUE-ENHANCEMENT STRATEGIES FOR NATURAL VCV , 2007 .

[10]  V. Hazan,et al.  Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions. , 2011, The Journal of the Acoustical Society of America.

[11]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[12]  R. Niederjohn,et al.  The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression , 1976 .

[13]  R. Niederjohn,et al.  Speech intelligibility enhancement in a power generating noise environment , 1978 .

[14]  Martin Cooke,et al.  Speech production modifications produced by competing talkers, babble, and stationary noise. , 2008, The Journal of the Acoustical Society of America.

[15]  Peter Vary,et al.  Near End Listening Enhancement: Speech Intelligibility Improvement in Noisy Environments , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16]  Ann R Bradlow,et al.  Production and perception of clear speech in Croatian and English. , 2004, The Journal of the Acoustical Society of America.