Real-Time Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility

In this paper, a novel approach is introduced for performing real-time speech modulation enhancement to increase speech intelligibility in noise. The proposed modulation enhancement technique operates independently in the frequency and time domains. In the frequency domain, a compression function is used to perform energy reallocation within a frame. This compression function contains novel scaling operations to ensure speech quality. In the time domain, a mathematical equation is introduced to reallocate energy from the louder to the quieter parts of the speech. This proposed mathematical equation ensures that the long-term energy of the speech is preserved independently of the amount of compression, hence gaining full control of the time-energy reallocation in real-time. Evaluations on intelligibility and quality show that the suggested approach increases the intelligibility of speech while maintaining the overall energy and quality of the speech signal.

[1]  D. Grantham,et al.  Modulation masking: effects of modulation frequency, depth, and phase. , 1989, The Journal of the Acoustical Society of America.

[2]  R. Niederjohn,et al.  The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression , 1976 .

[3]  Takayuki Arai,et al.  Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments , 2005, Speech Commun..

[4]  Yannis Stylianou,et al.  Approaching speech intelligibility enhancement with inspiration from Lombard and Clear speaking styles , 2014, Comput. Speech Lang..

[5]  Yannis Stylianou,et al.  Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression , 2012, INTERSPEECH.

[6]  Yannis Stylianou,et al.  Evaluating the intelligibility benefit of speech modifications in known noise conditions , 2013, Speech Commun..

[7]  S. Sheft,et al.  Temporal integration in amplitude modulation detection. , 1990, The Journal of the Acoustical Society of America.

[8]  Vassilis Tsiaras,et al.  REAL-TIME SPEECH-IN-NOISE INTELLIGIBILITY ENHANCEMENT BASED ON SPECTRAL SHAPING AND DYNAMIC RANGE COMPRESSION , 2014 .

[9]  Jesper Jensen,et al.  A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Peter Vary,et al.  Near End Listening Enhancement: Speech Intelligibility Improvement in Noisy Environments , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Yannis Stylianou,et al.  Near and Far Field Speech-in-Noise Intelligibility Improvements Based on a Time–Frequency Energy Reallocation Approach , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Yannis Stylianou,et al.  On spectral and time domain energy reallocation for speech-in-noise intelligibility enhancement , 2014, INTERSPEECH.

[13]  R. H. Bernacki,et al.  Effects of noise on speech production: acoustic and perceptual analyses. , 1988, The Journal of the Acoustical Society of America.

[14]  Richard C. Hendriks,et al.  Optimizing Speech Intelligibility in a Noisy Environment: A unified view , 2015, IEEE Signal Processing Magazine.

[15]  J. Beerends,et al.  Perceptual Objective Listening Quality Assessment ( POLQA ) , The Third Generation ITU-T Standard for End-to-End Speech Quality Measurement Part II – Perceptual Model , 2013 .

[16]  Yannis Stylianou,et al.  Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility in Noise , 2016, INTERSPEECH.

[17]  Yannis Stylianou,et al.  Robust full-band adaptive Sinusoidal analysis and synthesis of speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Yannis Stylianou,et al.  Intelligibility enhancement of casual speech for reverberant environments inspired by clear speech properties , 2015, INTERSPEECH.

[19]  J. C. Krause,et al.  Evaluating the role of spectral and envelope characteristics in the intelligibility advantage of clear speech. , 2009, The Journal of the Acoustical Society of America.

[20]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[21]  Yan Tang,et al.  Subjective and Objective Evaluation of Speech Intelligibility Enhancement Under Constant Energy and Duration Constraints , 2011, INTERSPEECH.

[22]  Jan Rennies,et al.  Improving speech intelligibility in noise by SII-dependent preprocessing using frequency-dependent amplification and dynamic range compression , 2013, INTERSPEECH.

[23]  Shihab A. Shamma Auditory cortical representation of complex acoustic spectra as inferred from the ripple analysis method , 1996 .

[24]  B. Blesser,et al.  Audio dynamic range compression for minimum perceived distortion , 1969 .

[25]  J. C. Krause,et al.  Acoustic properties of naturally produced clear speech at normal speaking rates. , 1996, The Journal of the Acoustical Society of America.

[26]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[27]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.