Intelligibility Enhancement Based on Mutual Information

Speech intelligibility enhancement is considered for multiple-microphone acquisition and single loudspeaker rendering. This is based on the mutual information measured between the message spoken at far-end environment and the message perceived by a listener at near-end. We prove that the joint optimal processing can be decomposed into far-end and near-end processing. The former is a minimum variance distortionless response beamformer that reduces the noise in the talker environment and the latter is a post-filter that redistributes the power over the frequency bands. Disjoint processing is optimal provided that the post-filtering operation is aware of the residual noise from the beamforming operation. Our results show that both processing steps are necessary for the effective conveyance of a message and, importantly, that the second step must be aware of the remaining noise from the beamforming operation in the first step. In addition, we study the use of the mutual information applied on the perceptually more relevant powers per critical band.

[1]  Richard C. Hendriks,et al.  An intelligibility metric based on a simple model of speech communication , 2016, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC).

[2]  Y. Selen,et al.  Model-order selection: a review of information criterion rules , 2004, IEEE Signal Processing Magazine.

[3]  Richard C. Hendriks,et al.  A Simple Model of Speech Communication and its Application to Intelligibility Enhancement , 2015, IEEE Signal Processing Letters.

[4]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[5]  B. Moore An introduction to the psychology of hearing, 3rd ed. , 1989 .

[6]  Yan Tang,et al.  Optimised spectral weightings for noise-dependent speech intelligibility enhancement , 2012, INTERSPEECH.

[7]  Rainer Martin,et al.  On mutual information as a measure of speech intelligibility , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Rainer Martin,et al.  Objective Intelligibility Measures Based on Mutual Information for Speech Subjected to Speech Enhancement Processing , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Radu Balan,et al.  Microphone array speech enhancement by Bayesian estimation of spectral amplitude and phase , 2002, Sensor Array and Multichannel Signal Processing Workshop Proceedings, 2002.

[10]  Jesper Jensen,et al.  Optimal Near-End Speech Intelligibility Improvement Incorporating Additive Noise and Late Reverberation Under an Approximation of the Short-Time SII , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[12]  K. D. Kryter Methods for the Calculation and Use of the Articulation Index , 1962 .

[13]  Richard C. Hendriks,et al.  Optimizing Speech Intelligibility in a Noisy Environment: A unified view , 2015, IEEE Signal Processing Magazine.

[14]  Richard Heusdens,et al.  A Low-Complexity Spectro-Temporal Distortion Measure for Audio Processing Applications , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[16]  Stephen P. Boyd,et al.  Convex Optimization: Convex optimization problems , 2004 .

[17]  Jesper Jensen,et al.  Speech Intelligibility Prediction Based on Mutual Information , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  Peter Vary,et al.  Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[19]  Anwar H. Joarder,et al.  On some characteristics of bivariate chi-square distribution , 2012 .

[20]  Simon King,et al.  Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion , 2014, Comput. Speech Lang..

[21]  Jesper Jensen,et al.  A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Keisuke Kinoshita,et al.  Improving syllable identification by a preprocessing method reducing overlap-masking in reverberant environments. , 2006, The Journal of the Acoustical Society of America.

[23]  George A. Miller,et al.  Language and Communication , 1951 .

[24]  R. Gallager Principles of Digital Communication , 2008 .

[25]  Richard C. Hendriks,et al.  Speech reinforcement with a globally optimized perceptual distortion measure for noisy reverberant channels , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[26]  Peter Vary,et al.  Near end listening enhancement optimized with respect to Speech Intelligibility Index , 2009, 2009 17th European Signal Processing Conference.

[27]  Richard C. Hendriks,et al.  Speech reinforcement in noisy reverberant environments using a perceptual distortion measure , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Amro El-Jaroudi,et al.  New signal decomposition method based speech enhancement , 2007, Signal Process..

[29]  Jesper Jensen,et al.  Noise Tracking Using DFT Domain Subspace Decompositions , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  W. Bastiaan Kleijn,et al.  Noise-dependent postfiltering , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  R. Niederjohn,et al.  The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression , 1976 .

[32]  Jesper Jensen,et al.  On Optimal Linear Filtering of Speech for Near-End Listening Enhancement , 2013, IEEE Signal Processing Letters.

[33]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[34]  Richard Heusdens,et al.  Speech energy redistribution for intelligibility improvement in noise based on a perceptual distortion measure , 2014, Comput. Speech Lang..

[35]  Martin Cooke,et al.  Information-preserving temporal reallocation of speech in the presence of fluctuating maskers , 2013, INTERSPEECH.

[36]  Astrid van Wieringen,et al.  Development of a Dutch matrix sentence test to assess speech intelligibility in noise , 2014, International journal of audiology.

[37]  Richard C. Hendriks,et al.  Multizone Speech Reinforcement , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[38]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[39]  Takayuki Arai,et al.  Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments , 2005, Speech Commun..

[40]  Richard Heusdens,et al.  A speech preprocessing strategy for intelligibility improvement in noise based on a perceptual distortion measure , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Marc Moonen,et al.  Evaluation of signal enhancement algorithms for hearing instruments , 2008, 2008 16th European Signal Processing Conference.

[43]  J. L. Hall,et al.  Intelligibility and listener preference of telephone speech in the presence of babble noise. , 2010, The Journal of the Acoustical Society of America.

[44]  Snr Recovery NEAR END LISTENING ENHANCEMENT: SPEECH INTELLIGIBILITY IMPROVEMENT IN NOISY ENVIRONMENTS , 2006 .

[45]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[46]  Jesper Jensen,et al.  DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement , 2013, DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement.

[47]  I. Pollack,et al.  Effects of Differentiation, Integration, and Infinite Peak Clipping upon the Intelligibility of Speech , 1948 .

[48]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[49]  Richard Heusdens,et al.  Matching pursuit for channel selection in cochlear implants based on an intelligibility metric , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[50]  Richard C. Hendriks,et al.  Jointly optimal near-end and far-end multi-microphone speech intelligibility enhancement based on mutual information , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[51]  Leon Steinberg,et al.  A Note on the Bivariate Chi Distribution , 1963 .