Speech pre-enhancement using a discriminative microscopic intelligibility model

We propose a new approach for optimally pre-enhancing speech signals for given noise conditions. Like others, we optimise the predicted intelligibility of the signal, however, we employ a statistical ‘microscopic’ intelligibility model that encodes information about which spectro-temporal speech regions are most informative. Uniquely, our optimisation strategy aims to maximise the discrimination between the correct interpretation and competing incorrect interpretations of the utterance. We present results from studies that use speech-shaped stationary noise maskers and show the new strategy leads to solutions that are more varied than the simple high frequency emphasis employed in many pre-enhancement systems.

[1]  Gustav Eje Henter,et al.  Maximizing Phoneme Recognition Accuracy for Enhanced Speech Intelligibility in Noise , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Yannis Stylianou,et al.  Evaluating the intelligibility benefit of speech modifications in known noise conditions , 2013, Speech Commun..

[3]  Peter Vary,et al.  Near End Listening Enhancement: Speech Intelligibility Improvement in Noisy Environments , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Jon Barker,et al.  Modelling speaker intelligibility in noise , 2007, Speech Commun..

[5]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[6]  Yannis Stylianou,et al.  Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression , 2012, INTERSPEECH.

[7]  Yan Tang,et al.  Optimised spectral weightings for noise-dependent speech intelligibility enhancement , 2012, INTERSPEECH.

[8]  Cassia Valentini-Botinhao,et al.  Intelligibility-enhancing speech modifications: the hurricane challenge , 2020, INTERSPEECH.

[9]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[10]  R. H. Bernacki,et al.  Effects of noise on speech production: acoustic and perceptual analyses. , 1988, The Journal of the Acoustical Society of America.

[11]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[12]  Daniel P. W. Ellis,et al.  Decoding speech in the presence of other sources , 2005, Speech Commun..

[13]  Richard Heusdens,et al.  A speech preprocessing strategy for intelligibility improvement in noise based on a perceptual distortion measure , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Martin Cooke,et al.  Speech production modifications produced by competing talkers, babble, and stationary noise. , 2008, The Journal of the Acoustical Society of America.

[15]  Peter Vary,et al.  NEAR END LISTENING ENHANCEMENT CONSIDERING THERMAL LIMIT OF MOBILE PHONE LOUDSPEAKERS , 2011 .

[16]  Werner Verhelst,et al.  Time and frequency dependent amplification for speech intelligibility enhancement in noisy environments , 2008, INTERSPEECH.

[17]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[18]  Jeffrey C. Lagarias,et al.  Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions , 1998, SIAM J. Optim..

[19]  N I Durlach,et al.  Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech. , 1985, Journal of speech and hearing research.

[20]  Yan Tang,et al.  Energy reallocation strategies for speech enhancement in known noise conditions , 2010, INTERSPEECH.

[21]  Snr Recovery NEAR END LISTENING ENHANCEMENT: SPEECH INTELLIGIBILITY IMPROVEMENT IN NOISY ENVIRONMENTS , 2006 .