Acoustic Sensing Analytics Applied to Speech in Reverberation Conditions

The paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces with few absorbing or diffusing surfaces. One of the natural remedies to improve speech intelligibility in such conditions may be achieved through speaking slowly. It is possible to use algorithms that reduce the rate of speech (RoS) in real time. Therefore, the study aims to find recommended values of RoS in the context of STI (speech transmission index) in different acoustic environments. In the experiments, speech intelligibility for six impulse responses recorded in spaces with different STIs is investigated using a sentence test (for the Polish language). Fifteen subjects with normal hearing participated in these tests. The results of the analytical analysis enabled us to propose a curve specifying the maximum RoS values translating into understandable speech under given acoustic conditions. This curve can be used in speech processing control technology as well as compressive reverse acoustic sensing.

[1]  K. Leo Speech Intelligibility Measurements in Auditorium , 2010 .

[2]  Andrzej Czyzewski,et al.  A Method of Real-Time Non-uniform Speech Stretching , 2011, ICETE.

[3]  Anthony D. Fagan,et al.  A novel high quality efficient algorithm for time-scale modification of speech , 1999, EUROSPEECH.

[4]  Jing Xia,et al.  Effects of reverberation and noise on speech intelligibility in normal-hearing and aided hearing-impaired listeners. , 2018, The Journal of the Acoustical Society of America.

[5]  Timothy W. Leishman,et al.  Effects of added absorption on the vocal exertions of talkers in a reverberant room. , 2019, The Journal of the Acoustical Society of America.

[6]  Bozena Kostek,et al.  Improving Objective Speech Quality Indicators in Noise Conditions , 2020 .

[7]  Steven van de Par,et al.  A Speech Preprocessing Method Based on Overlap-Masking Reduction to Increase Intelligibility in Reverberant Environments , 2017 .

[8]  Nao Hodoshima Reverberation-induced speech improves intelligibility in reverberation : Effects of taker gender and speaking rate , 2019 .

[9]  Enhancement of speech intelligibility under noisy reverberant conditions based on modulation spectrum concept , 2020, 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[10]  Borong Lin,et al.  Acoustic Environment of Large Terminal Airside Concourse in China , 2019 .

[11]  Kostas Kokkinakis,et al.  A channel-selection criterion for suppressing reverberation in cochlear implants. , 2011, The Journal of the Acoustical Society of America.

[12]  A. Kjellberg,et al.  Effects of reverberation time on the cognitive load in speech communication: theoretical considerations. , 2004, Noise & health.

[13]  Tammo Houtgast,et al.  Effect of talker and speaking style on the speech transmission index. , 2004, The Journal of the Acoustical Society of America.

[14]  Bożena Kostek,et al.  Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement , 2020, IEEE Access.

[15]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[16]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[17]  Gail D. Chermak,et al.  Handbook of (central) auditory processing disorder , 2007 .

[18]  Bing Chen,et al.  Acoustic Comfort in Shopping Mall Atrium Spaces—A Case Study in Sheffield Meadowhall , 2004 .

[19]  Andrzej Czyzewski,et al.  Real-Time Speech Signal Segmentation Methods , 2013 .

[20]  Anna Warzybok,et al.  The new Polish tests for speech intelligibility measurements , 2009, Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2009.

[21]  Tao Zhang,et al.  Empirical Bayes based relative impulse response estimation. , 2018, The Journal of the Acoustical Society of America.

[22]  Yosra Mzah,et al.  Enhancing speech intelligibility in reverberant spaces by a speech features distributions dependent pre-processing , 2018, Int. J. Speech Technol..

[23]  K. Kitapci,et al.  Comparison of speech intelligibility between English, Polish, Arabic and Mandarin , 2014 .

[24]  Werner Verhelst,et al.  An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Yannis Stylianou,et al.  Adaptive gain control and time warp for enhanced speech intelligibility under reverberation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Eric Fosler-Lussier,et al.  Speech recognition using on-line estimation of speaking rate , 1997, EUROSPEECH.

[27]  A. Electa Alice Jayarani,et al.  An Introduction to Sparse Sampling on Audio Signal by Exploring Different Basis Matrices , 2021 .

[28]  Martin Cooke,et al.  Speech production modifications produced by competing talkers, babble, and stationary noise. , 2008, The Journal of the Acoustical Society of America.

[29]  Huanyu Dong,et al.  Speech intelligibility improvement in noisy reverberant environments based on speech enhancement and inverse filtering , 2018, EURASIP J. Audio Speech Music. Process..

[30]  J. S. Bradley,et al.  On the importance of early reflections for speech in rooms. , 2003, The Journal of the Acoustical Society of America.

[31]  R. Harris,et al.  Effects of reverberation and noise on speech recognition by adults with various amounts of sensorineural hearing impairment. , 1990, Audiology : official organ of the International Society of Audiology.

[32]  Hui Ma,et al.  The speech intelligibility and applicability of the speech transmission index in large spaces , 2020 .

[33]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[34]  Peter F. Assmann,et al.  The Perception of Speech Under Adverse Conditions , 2004 .

[35]  Takayuki Arai,et al.  The Effects of Speech-Rate Slowing for Improving Speech Intelligibility in Reverberant Environments (国際ワークショップ Frontiers in Speech and Hearing Research) , 2006 .

[36]  Jan Rennies,et al.  Speech-in-noise enhancement using amplification and dynamic range compression controlled by the speech intelligibility index. , 2015, The Journal of the Acoustical Society of America.

[37]  Damian Murphy,et al.  Source excitation strategies for obtaining impulse responses in finite difference time domain room acoustics simulation , 2014 .

[38]  Michael Bianco,et al.  Introduction to special issue on compressive sensing in acoustics. , 2018, The Journal of the Acoustical Society of America.

[39]  Keisuke Kinoshita,et al.  Effects of suppressing steady-state portions of speech on intelligibility in reverberant environments , 2002 .

[40]  H. Brumm,et al.  The evolution of the Lombard effect: 100 years of psychoacoustic research , 2011 .

[41]  Jérôme Farinas,et al.  Automatic estimation of speaking rate in multilingual spontaneous speech , 2004, Speech Prosody 2004.

[42]  Jean-Claude Junqua,et al.  The Lombard effect: a reflex to better communicate with others in noise , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[43]  Jan Rennies,et al.  Adaptive Compressive Onset-Enhancement for Improved Speech Intelligibility in Noise and Reverberation , 2020, INTERSPEECH.

[44]  Marc René Schädler Optimization and Evaluation of an Intelligibility-Improving Signal Processing Approach (IISPA) for the Hurricane Challenge 2.0 with FADE , 2020, INTERSPEECH.

[45]  Pasquale Bottalico,et al.  Speech produced in noise: Relationship between listening difficulty and acoustic and durational parameters. , 2017, The Journal of the Acoustical Society of America.

[46]  Piotr Kleczkowski,et al.  Lombard Effect in Polish Speech and its Comparison in English Speech , 2017 .

[47]  Jan Radosz Speech Intelligibility Test for Polish Language – Relation to the Acoustic Properties of Classrooms and Comparison to Other Languages , 2018 .

[48]  Pamela E Souza,et al.  Intelligibility and Clarity of Reverberant Speech: Effects of Wide Dynamic Range Compression Release Time and Working Memory. , 2016, Journal of speech, language, and hearing research : JSLHR.

[49]  Simon King,et al.  A Sound Engineering Approach to Near End Listening Enhancement , 2020, INTERSPEECH.

[50]  Glenn Leembruggen,et al.  A revised speech spectrum for STI calculations , 2018 .

[51]  Abigail A. Kressner,et al.  The impact of reverberation on speech intelligibility in cochlear implant recipients. , 2018, The Journal of the Acoustical Society of America.

[52]  Bozena Kostek,et al.  Improving the quality of speech in the conditions of noise and interference , 2018 .

[53]  Sridha Sridharan,et al.  Intelligibility of reverberant speech enhanced by inversion of room response , 1994, Proceedings of ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks.

[54]  C. Darwin,et al.  Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers. , 2003, The Journal of the Acoustical Society of America.

[55]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[56]  M. Meissner Prediction of low-frequency sound field in rooms with complex-valued boundary conditions on walls , 2019 .

[57]  Jian Kang,et al.  Relationship Between Chinese Speech Intelligibility and Speech Transmission Index Under Reproduced General Room Conditions , 2014 .

[59]  Jaewook Lee,et al.  A speech perturbation strategy based on "Lombard effect" for enhanced intelligibility for cochlear implant listeners. , 2020, The Journal of the Acoustical Society of America.

[60]  Subjective speech intelligibility and soundscape perception of English, Polish, Arabic and Mandarin , 2015 .

[61]  Ville Sivonen,et al.  An exploratory investigation of speech recognition thresholds in noise with auralisations of two reverberant rooms , 2020, International journal of audiology.

[62]  Thilo Pfau,et al.  Estimating the speaking rate by vowel detection , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[63]  Hanwook Chung,et al.  Deep Convolutional Neural Network-Based Inverse Filtering Approach for Speech De-Reverberation , 2020, 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP).

[64]  Cassia Valentini-Botinhao,et al.  Intelligibility-Enhancing Speech Modifications - The Hurricane Challenge 2.0 , 2020, INTERSPEECH.

[65]  Takayuki Arai,et al.  Decreasing speaking-rate with steady-state suppression to improve speech intelligibility in reverberant environments , 2007 .

[66]  Holger Rauhut,et al.  A Mathematical Introduction to Compressive Sensing , 2013, Applied and Numerical Harmonic Analysis.

[67]  Robert E. Moore,et al.  Effects of reverberation and filtering on speech rate judgment , 2007, International journal of audiology.

[68]  E. Golob,et al.  Evidence that the Lombard effect is frequency-specific in humans. , 2013, The Journal of the Acoustical Society of America.

[69]  Matthew H. Davis,et al.  Speech recognition in adverse conditions: A review , 2012 .

[70]  Partha Niyogi,et al.  Robust acoustic-based syllable detection , 2006, INTERSPEECH.

[71]  Herman J. M. Steeneken,et al.  Past, present and future of the speech transmission index , 2002 .

[72]  DeLiang Wang,et al.  Noisy-Reverberant Speech Enhancement Using DenseUNet with Time-Frequency Attention , 2020, INTERSPEECH.

[73]  David Dorran,et al.  Audio Time-Scale Modification , 2005 .

[74]  Methods for the subjective assessment of small impairments in audio systems , 2015 .

[75]  J. Beerends,et al.  Perceptual Objective Listening Quality Assessment ( POLQA ) , The Third Generation ITU-T Standard for End-to-End Speech Quality Measurement Part II – Perceptual Model , 2013 .

[76]  Takayuki Arai Padding zero into steady-state portions of speech as a preprocess for improving intelligibility in reverberant environments , 2005 .