Speech intelligibility in noise with varying spatial acoustics under Ambisonics-based sound reproduction system

Abstract The present study investigates the effect of noise on speech intelligibility under an Ambisonics-based sound reproduction system, which is a commonly used technique for realising audio virtual reality. The sound reproduction system was built in an anechoic chamber using a 16 channel spherical loudspeaker array. A commercially available first-order Ambisonics microphone array was utilised for collecting room impulse responses of rooms with different acoustic characteristics, and a decoder for generating signals rendered from the loudspeaker array. A subjective listening test was conducted in the sound reproduction system examining how intelligibility of sentences is compromised by noise under virtually reproduced acoustics environments with various acoustical conditions. Particular focus was given to investigating the effect of the amount of reverberation in rooms as well as the angular separation between the target speech and noise sources, both of which are known to have a significant effect on speech intelligibility in real (i.e. non-virtual) acoustic environment. The experimental results suggest that the intelligibility of speech masked by pink noise was significantly reduced under reverberant acoustical environments reproduced by the Ambisonics-based sound reproduction system compared to the baseline anechoic environment. However, only marginal differences were observed between the acoustical environments with different reverberation time. The effect of spatial release from masking was also observed under the reproduced acoustic environments and showed some correlation with the accuracy of source localisation realised by the sound reproduction system.

[1]  Densil Cabrera,et al.  Auditory distraction in open-plan office environments: The effect of multi-talker acoustics , 2017, 2304.12817.

[2]  Margaret Maclagan,et al.  Out of the AIR and into the EAR: Another view of the New Zealand diphthong merger , 1996, Language Variation and Change.

[3]  Mendel Kleiner,et al.  Auralization-An Overview , 1993 .

[4]  D. D. Greenwood,et al.  Auditory Masking and the Critical Band , 1961 .

[5]  Boaz Rafaely,et al.  Intelligibility of speech in noise under diotic and dichotic binaural listening , 2017 .

[6]  Catherine I. Watson,et al.  Masker design for real-time informational masking with mitigated annoyance , 2020 .

[7]  R Plomp,et al.  The effect of head-induced interaural time and level differences on speech intelligibility in noise. , 1987, The Journal of the Acoustical Society of America.

[8]  G. A. Miller,et al.  The Intelligibility of Interrupted Speech , 1948 .

[9]  Archontis Politis,et al.  COMPASS: Coding and Multidirectional Parameterization of Ambisonic Sound Scenes , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Yusuke Hioka,et al.  Effect of adding artificial reverberation to speech-like masking sound , 2016 .

[11]  Boaz Rafaely,et al.  Spatial release from masking for binaural reproduction of speech in noise with varying spherical harmonics order , 2019 .

[12]  Matti Karjalainen,et al.  Communication Acoustics: An Introduction to Speech, Audio and Psychoacoustics , 2015 .

[13]  A. Nabelek,et al.  Monaural and binaural speech perception in reverberation for listeners of various ages. , 1982, The Journal of the Acoustical Society of America.

[14]  W. Hartmann,et al.  The role of reverberation in release from masking due to spatial separation of sources for speech identification , 2005 .

[15]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[16]  Tapio Lokki,et al.  Directional Audio Coding: Virtual Microphone-Based Synthesis and Subjective Evaluation , 2009 .

[17]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[18]  E. Rodero A comparative analysis of speech rate and perception in radio bulletins , 2012 .

[19]  Philippe Coiffet,et al.  Virtual Reality Technology , 2003, Presence: Teleoperators & Virtual Environments.

[20]  H. Masuda,et al.  Protecting speech privacy from native/non-native listeners - effect of masker type , 2019 .

[21]  J. C. Middlebrooks Sound localization. , 2015, Handbook of clinical neurology.

[22]  Philipp Berens,et al.  CircStat: AMATLABToolbox for Circular Statistics , 2009, Journal of Statistical Software.

[23]  Etienne Parizet,et al.  Investigation on localisation accuracy for first and higher order ambisonics reproduced sound sources , 2013 .

[24]  Ruth Y Litovsky,et al.  Speech intelligibility and spatial release from masking in young children. , 2005, The Journal of the Acoustical Society of America.

[25]  Philip Leistner,et al.  Sound Masking Performance of Time-Reversed Masker Processed from the Target Speech , 2012 .

[26]  Ruth Y. Litovsky,et al.  Spatial Release from Masking , 2012 .

[27]  A. Nabelek,et al.  Reverberant overlap- and self-masking in consonant identification. , 1989, The Journal of the Acoustical Society of America.

[28]  Lauren Calandruccio,et al.  Linguistic contributions to speech-on-speech masking for native and non-native listeners: language familiarity and semantic content. , 2012, Journal of the Acoustical Society of America.

[29]  Anne Cutler,et al.  Non-native speech perception in adverse conditions: A review , 2010, Speech Commun..

[30]  S Buus,et al.  Age of second-language acquisition and perception of speech in noise. , 1997, Journal of speech, language, and hearing research : JSLHR.