A speech preprocessing strategy for intelligibility improvement in noise based on a perceptual distortion measure

A speech pre-processing algorithm is presented to improve the speech intelligibility in noise for the near-end listener. The algorithm improves the intelligibility by optimally redistributing the speech energy over time and frequency for a perceptual distortion measure, which is based on a spectro-temporal auditory model. In contrast to spectral-only models, short-time information is taken into account. As a consequence, the algorithm is more sensitive to transient regions, which will therefore receive more amplification compared to stationary vowels. It is known from literature that changing the vowel-transient energy ratio is beneficial for improving speech-intelligibility in noise. Objective intelligibility prediction results show that the proposed method has higher speech intelligibility in noise compared to two other reference methods, without modifying the global speech energy.

[1]  Peter Vary,et al.  NEAR END LISTENING ENHANCEMENT WITH STRICT LOUDSPEAKER OUTPUT POWER CONSTRAINING , 2006 .

[2]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Andrew C. Simpson,et al.  The effect of cue-enhancement on the intelligibility of nonsense word and sentence materials presented in noise , 1998, Speech Commun..

[4]  R. Niederjohn,et al.  The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression , 1976 .

[5]  Richard Heusdens,et al.  A low-complexity spectro-temporal based perceptual model , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[7]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[8]  John G. Harris,et al.  Applied principles of clear and Lombard speech for automated intelligibility enhancement in noisy environments , 2006, Speech Commun..

[9]  J. Jenkins,et al.  Dynamic specification of coarticulated vowels. , 1983, The Journal of the Acoustical Society of America.

[10]  Jesper Jensen,et al.  A Perceptual Model for Sinusoidal Audio Coding Based on Spectral Integration , 2005, EURASIP J. Adv. Signal Process..

[11]  Jesper Jensen,et al.  MMSE based noise PSD tracking with low complexity , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.