Speech reinforcement with a globally optimized perceptual distortion measure for noisy reverberant channels

In this paper, a time-frequency weighting is proposed for speech reinforcement (near-end listening enhancement) in a noisy and reverberant environment, which optimizes a perceptual distortion measure globally for a number of time-frequency bins. Simulations confirm the optimality of the algorithm and a comparison is made to three reference methods using two additional instrumental measures.

[1]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[2]  Richard Heusdens,et al.  A speech preprocessing strategy for intelligibility improvement in noise based on a perceptual distortion measure , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Takayuki Arai,et al.  Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments , 2005, Speech Commun..

[4]  Keisuke Kinoshita,et al.  Improving syllable identification by a preprocessing method reducing overlap-masking in reverberant environments. , 2006, The Journal of the Acoustical Society of America.

[5]  Alfred Mertins,et al.  Room Impulse Response Shortening/Reshaping With Infinity- and $p$ -Norm Optimization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Jesper Jensen,et al.  DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement , 2013, DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement.

[7]  J. Jenkins,et al.  Dynamic specification of coarticulated vowels. , 1983, The Journal of the Acoustical Society of America.

[8]  Cassia Valentini-Botinhao,et al.  Intelligibility-enhancing speech modifications: the hurricane challenge , 2020, INTERSPEECH.

[9]  N. Popplewell,et al.  Combined effects of early reflections and background noise on speech intelligibility , 1989 .

[10]  J. Polack La transmission de l'energie sonore dans les salles , 1988 .

[11]  R. H. Bolt,et al.  Theory of Speech masking by reverberation , 1949 .

[12]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[13]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[14]  A. Mertins,et al.  Room Impulse Response Shortening by Channel Shortening Concepts , 2005, Conference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005..

[15]  Simon King,et al.  The listening talker: A review of human and algorithmic context-induced modifications of speech , 2014, Comput. Speech Lang..

[16]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[17]  Richard Heusdens,et al.  A Low-Complexity Spectro-Temporal Distortion Measure for Audio Processing Applications , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Peter Vary,et al.  NEAR END LISTENING ENHANCEMENT WITH STRICT LOUDSPEAKER OUTPUT POWER CONSTRAINING , 2006 .

[19]  Jesper Jensen,et al.  An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech. , 2011, The Journal of the Acoustical Society of America.

[20]  Richard C. Hendriks,et al.  Speech reinforcement in noisy reverberant environments using a perceptual distortion measure , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).